Projecting unseen (test) data to the trained latent space

Hi @rargelaguet ,

I am using MOFA2 on bulk omics data integration. I would like to test how well it generalizes on unseen datasets. I wanted to make sure if I am doing it properly and get any comments from you if there are other things that can be done or accounted for while doing this.

Basically, I want to build a MOFA model on some training data and then project the test (unseen) data onto the latent space learned from the training data. Sorry if this is explained in any of the tutorials, but I couldn't find an explanation of how to do this with MOFA.

What I currently do is this:

trainData and testData are lists of omics views (with matching features and omics types for different sets of samples)

MOFAobject <- create_mofa(trainData)
modelOptions <- MOFA2::get_default_model_options(MOFAobject)
modelOptions$num_factors <- 10
MOFAobject <- prepare_mofa(MOFAobject, model_options = modelOptions)
MOFAobject.trained <- run_mofa(MOFAobject, use_basilisk = TRUE)
# extract the common latent variables from the trained object 
train_factors <- get_factors(MOFAobject.trained, factors = "all")
# extract weights for all features vs all factors 
weights <- get_weights(MOFAobject.trained, views = "all", factors = "all")
# project the test data onto the trained latent space using the feature weights
test_factors <- t(do.call(rbind, testData)) %*% do.call(rbind, weights)

Is this the correct way of doing this or do I need to account for something else when projecting the test data with feature weights?

Thanks! Bora

bioFAM / MOFA2

Projecting unseen (test) data to the trained latent space #110