bioFAM / MOFA

Multi-Omics Factor Analysis
GNU Lesser General Public License v3.0
235 stars 60 forks source link

Training and validation set #20

Closed anso-sertier closed 6 years ago

anso-sertier commented 6 years ago

Hi,

Is it possible to train a model with training samples and then use the model on another set of samples to validate the LFs obtained ? I don't know if i'm clear, but I would like to compute the Z matrices (samples against LF) for samples not used to train the model. I have the Y matrices for the new samples.

Thanks a lot in advance

Anne-Sophie Sertier

rargelaguet commented 6 years ago

Hello, apologies for the slow reply.

Mathematically it is as simple as rearranging the master equation from Y = WZ to Z = inv(W)Y. If you can calculate the inverse of W, then you can project new samples to the latent space. However, W is not a square matrix, so you would have to compute a pseudoinverse of some sort

I guess you are testing generalisation/overfitting capacities. Due to all the sparsity priors and its linear unsupervised nature, the model is very unlikely to overfit. But if you want to test this, a good approach could be out of sample prediction by masking values at random (using all samples). Alternatively, you could test how much variance the model explains or how many factors it recovers after downsampling the dataset.

I hope this was useful, let me know if you have more questions.