bioFAM / MOFA2

Multi-Omics Factor Analysis
https://biofam.github.io/MOFA2/
GNU Lesser General Public License v3.0
308 stars 52 forks source link

Projecting unseen (test) data to the trained latent space #110

Open borauyar opened 1 year ago

borauyar commented 1 year ago

Hi @rargelaguet ,

I am using MOFA2 on bulk omics data integration. I would like to test how well it generalizes on unseen datasets. I wanted to make sure if I am doing it properly and get any comments from you if there are other things that can be done or accounted for while doing this.

Basically, I want to build a MOFA model on some training data and then project the test (unseen) data onto the latent space learned from the training data. Sorry if this is explained in any of the tutorials, but I couldn't find an explanation of how to do this with MOFA.

What I currently do is this:

trainData and testData are lists of omics views (with matching features and omics types for different sets of samples)

MOFAobject <- create_mofa(trainData)
modelOptions <- MOFA2::get_default_model_options(MOFAobject)
modelOptions$num_factors <- 10
MOFAobject <- prepare_mofa(MOFAobject, model_options = modelOptions)
MOFAobject.trained <- run_mofa(MOFAobject, use_basilisk = TRUE)
# extract the common latent variables from the trained object 
train_factors <- get_factors(MOFAobject.trained, factors = "all")
# extract weights for all features vs all factors 
weights <- get_weights(MOFAobject.trained, views = "all", factors = "all")
# project the test data onto the trained latent space using the feature weights
test_factors <- t(do.call(rbind, testData)) %*% do.call(rbind, weights)

Is this the correct way of doing this or do I need to account for something else when projecting the test data with feature weights?

Thanks! Bora

gtca commented 1 year ago

Hey @borauyar,

It would be great to see how MOFA performs on unseen data! And I don't think I saw any public tutorials on that myself.

As in the MOFA model $Y \sim Z W^{T}$, multiplying the new data $Y^{'}$ by weights $W$ doesn't give you $Z^{'}$. One could compute a pseudo-inverse matrix so that $Z^{'} \sim Y^{'} (W^{T})^{-1}$.

Maybe this Python notebook can give a better idea how it could work.