bioFAM / MOFA2

Multi-Omics Factor Analysis
https://biofam.github.io/MOFA2/
GNU Lesser General Public License v3.0
283 stars 49 forks source link

model_opts$spikeslab_weights <- TRUE #115

Open ChristianRohde opened 1 year ago

ChristianRohde commented 1 year ago

Hi there,

it took me a while to understand why my MOFA2 analyses with version 1.4.0 are so different from 1.8.0. With the latest version I had to manually add model_opts$spikeslab_weights <- TRUE to my script ALTHOUGH it should have been set to TRUE by default based on the help. Unfortunately, it was running with spikeslab_weights = FALSE which made quite some difference between analyses running with different R versions on different computers.

Best, Christian

gtca commented 1 year ago

Thanks, Christian!

This is caused by 116f6e6778c3aef6bf286c5f62c1a9a1efa4a29d indeed. @rargelaguet, I would consider to roll back this default setting back to TRUE in v1.8.1.

And I think we would also be generally curious what type of data do you see the big differences in the results on.

ChristianRohde commented 1 year ago

we have a data set with 2'O-methylation levels from the ribosome. These are about 110 sites in total. We applied ribomethseq and calculated the scoreC, which is kind of similar to the beta-value in a range from 0-1 (mainly highly methylated sites). From these we select about 59 to add to the other omics for MOFA2. Our top candidate site in some samples is unmethylated while in others it is high methylated. This site shows most differences in our data. As MOFA2 did not catch our candidate and did not put it to the top ranked weights we transferred the data to M-values using a similar formula as used here https://rdrr.io/github/xuz1/ENmix/man/B2M.html. These values are now in a range around zero with some below 0 and others up to 12. After this data transformation we directly got from MOFA what we saw in our data with our manual analysis. Now with the new setting it first of all changed the order of factors. Next, our top candidate still was kind of OK, but it got a little bit worse overall ranking based on the feature weights and also the scatterplot of feature values vs factor values was less clear. This would not have changed the story completely, but I felt uncomfortable with the MOFA2 analysis since the results were kind of shaky. At least in this case based on our other manual analyses we think the results using model_opts$spikeslab_weights <- TRUE as shown with MOFA2 version 1.4.0 out of the box reflect our data better:

Version 1.4.0 MOFA_1 4 0

Version 1.8.0 MOFA_1 8 0

rargelaguet commented 1 year ago

Hi @ChristianRohde , thanks for reporting this, I am glad that you resolved the issue. Indeed we should have documented and clarified the change in default values for model_opts$spikeslab_weights, apologies.

To give some background: the assumption of sparsity on the weights is sometimes a tricky one. It definitely improves the interpretation of weights, but in single-cell data I found worse representations of the latent factors, I think because the spike-slab breaks the continuity of the latent spaces. We did some testing and it didn't change the weights much, but based on your results we should consider reverting to model_opts$spikeslab_weights = TRUE.