Closed tangoed2whiskey closed 4 years ago
Hi Tom,
Thanks for your excellent report. I'll need some time to look into it. I'm not the expert on the algorithm, or on out-of-matrix prediction.
@thanhlv @jaak-s: can you cast an eye on the above?
I got this answer from Thanh:
Hi Tom,
The out-of-matrix prediction validation in the notebook can be indeed validated by using the colors of the implanted bi-clusters. We created a dataset, in which we implanted four diagonal biclusters. Then, we randomly sampled 80% of the data for training and 20% for testing. The sampled data for the train set and the test are sorted by index; hence the structure and the color of the bi-clusters are more or less similar. That is, if we trained a SMURFF on the train set, we make the out-of-matrix prediction using the row features of the test set, the resulting predicted matrix should have the similar color/structures as the ones in the orginal matrix.
The pred_out_of_matrix() function performs out-of-matrix prediction while the predict_all(), provided by SMRUFF API, performs in-matrix prediction. Hence, they are not the same.
Hope this helps. If there is something unclear, please feel free to let me know.
Best regards, Thanh
Thanks for the reply; I'm afraid I still don't entirely understand the difference between in- and out-of-matrix predictions. As far as I can tell, when making predictions there is effectively a function that maps the side information onto the latent space, and then predictions are made using this reduced-dimensional matrix. I can't see why this is different when the test examples are in or out of the original matrix: whether they have been trained on is the key property (if trained on, should be well predicted), but using the training side data to make in- or out-of-matrix should be the same?
I know I'm missing something here, if you would be able to clarify further that would be great!
Hey Tom,
to be honest: I also do not understand the notebook Thanh made.
But what you describe about out-of-matrix predictions is correct and is supported using python in SMURFF (see https://smurff.readthedocs.io/en/latest/notebooks/inference_with_smurff.html#Make-predictions-using-side-information)
The code in SMURFF that implements this is a bit complicated: https://github.com/ExaScience/smurff/blob/master/python/smurff/smurff/predict.py#L90
But the original implementation by Jaak is much clearer: https://macau.readthedocs.io/en/latest/source/saving_models.html#using-the-saved-model-to-predict-new-rows-compounds
If you want we can do a conf call where I explain you how this works.
Cheers, Tom
Hi Tom,
Thanks very much for that, that was really helpful! I think I now understand the problem much better: however, I still have not managed to work out why predict_all is working differently to the outside-matrix prediction. I have hacked together my own method like the predict_one method you highlighted which makes predictions for many examples at a time:
def predict_many(self, coords, value = float("nan")):
ret=[]
for coord in coords:
p = Prediction(coord, value)
for s in self.samples:
p.add_sample(s.predict(p.coords))
ret.append(p)
return [r.pred_all for r in ret]
Using this as
places = [(sinfo, col) for sinfo in train_fea for col in range(train_ds.shape[1])]
pred_test_ds3 = np.mean(predictor.predict_many(places),axis=0)
the constructed pred_test_ds3 gives exactly the same as the pred_out_of_matrix function (as one would hope, as they do the same thing!). However this is not the same as the predict_all method gives, and in my tests the predict_all method is considerably more accurate. This makes me wary of using the pred_out_of_matrix (or predict_many) function in anger, as it can't pass the simple test of giving the same answer as what should be a comparable method.
I hope that makes clear what I'm concerned about!
Best wishes, Tom
Out of matrix prediction is not using the train matrix, only side info.
predict_all
is using the train data and the side info.
Does this make sense?
I'm sorry, I'm still not quite understanding this: exactly what extra information does the predict_all
method have access to, what is it using from the matrix of training data? I can't tell from the code, and especially can't tell why this same information shouldn't be applicable on out-of-matrix predictions (with some caveats of course).
R
where R is the rating matrix
Hi Tom,
I found a bug in the out-of-matrix prediction code. See #120.
T.
Cheers Tom, thanks for the heads-up; I'll definitely take another look at using the out-of-matrix predictions when that's sorted. I assume there isn't an easy fix I could implement quickly?
The fix has been implemented and I’m currently testing it. I’m also planning on creating a better explanatory notebook.
Brilliant! Looking forward to trying it soon then
Hi Tom. after some fixes in your code, it seems to work out:
#!/usr/bin/env python
# coding: utf-8
# In[ ]:
def predict_out_of_matrix_1s(side_info_matrix, sample_predictor):
"""Out-of-matrix prediction using one sample
Args:
side_info_matrix: numpy side info matrix
sample_predictor: Smurff sample object
Returns:
numpy fully predicted matrix
"""
U, V = sample_predictor.latents
Umu, Vmu = sample_predictor.mus
Ubeta, Vbeta = sample_predictor.betas
wU = side_info_matrix.dot(Ubeta.transpose()) + Umu
m = np.matmul(wU, V)
return m
def pred_out_of_matrix(side_info_matrix, predictor):
"""Out-of-matrix prediction using all of the samples
Args:
side_info_matrix: numpy side info matrix
predictor: Smurff PredictSession
Returns:
numpy fully predicted matrix (obtained by averaging)
"""
predictions = np.array([predict_out_of_matrix_1s(side_info_matrix, s) for s in predictor.samples()])
return predictions.mean(axis = 0)
# In[ ]:
import data_simulation as sim
import numpy as np
from scipy.sparse import coo_matrix
import smurff
ds = sim.gen_matrix(1000,400,320,4)
sparse_matrix = sim.sparsify(ds['matrix'], sparsity = 0.2)
print("Main matrix: ", sparse_matrix.shape)
train_indices = np.random.choice(sparse_matrix.shape[0], round(0.8*sparse_matrix.shape[0]),replace=False)
test_indices = np.setdiff1d(np.arange(sparse_matrix.shape[0]), train_indices)
train_indices = np.sort(train_indices)
test_indices = np.sort(test_indices)
train_ds = sparse_matrix[train_indices,]
train_fea = ds['sinfo'][train_indices,]
test_ds = sparse_matrix[test_indices,]
test_fea = ds['sinfo'][test_indices,]
sp_train_ds = coo_matrix(train_ds)
sp_train_fea = coo_matrix(train_fea)
sp_test_ds = coo_matrix(test_ds)
sp_train_ds1, sp_train_ds2 = smurff.make_train_test(sp_train_ds, 0.1)
print("Validation matrix:", sp_test_ds.shape)
print("Train matrix:", sp_train_ds1.shape)
print("Test matrix:", sp_train_ds2.shape)
print('Actual mean of validation data : {:.5f}'.format(np.mean(sp_test_ds.data)))
print('Actual mean of train data : {:.5f}'.format(np.mean(sp_train_ds1.data)))
print('Actual mean of test data : {:.5f}'.format(np.mean(sp_train_ds2.data)))
session = smurff.MacauSession( Ytrain = sp_train_ds1,
Ytest = sp_train_ds2,
side_info = [sp_train_fea,None],
num_latent = 32,
burnin = 100,
nsamples = 400,
save_freq = 1,
save_prefix=".",
verbose = 1,
direct = True)
predictions = session.run()
predictor = session.makePredictSession()
# In[ ]:
pred_test_ds = pred_out_of_matrix(train_fea, predictor)
pred_test_ds2 = predictor.predict_all()
print('Predicted mean of train data using pred_out_of_matrix: {:.5f}'.format(np.mean(pred_test_ds)))
print('Predicted mean of train data using .predict_all : {:.5f}'.format(np.mean(pred_test_ds2)))
Now I get the output:
Main matrix: (1000, 400)
Validation matrix: (200, 400)
Train matrix: (800, 400)
Test matrix: (800, 400)
Actual mean of validation data : 0.10944
Actual mean of train data : 0.11539
Actual mean of test data : 0.17850
Predicted mean of train data using pred_out_of_matrix: 0.13512
Predicted mean of train data using .predict_all : 0.13544
Feel free to re-open if you want.
I'm attempting to use smurff to do out-of-matrix predictions. I've followed the syn_out_matrix_prediction notebook, but am now having difficulty interpreting the output.
I was expecting that if I run the pred_out_of_matrix function with the side information of the training data, this should be the same as using the predict_all() method directly on the predictor. Eventually I'll want to use new side data, but I'm starting off with something we should know the answer for.
However, the script below gives different results using the pred_out_of_matrix function and using the predict_all() method. Both results also don't appear particularly accurate.
Would it be possible to explain what I'm doing wrong with this test?
Code and sample output below.
which gives output