bioFAM / MOFA

Multi-Omics Factor Analysis
GNU Lesser General Public License v3.0
231 stars 57 forks source link

Error in if (any(r[[i]] > cor_threshold)) { : missing value where TRUE/FALSE needed #46

Closed PietroD closed 4 years ago

PietroD commented 4 years ago

Hi

I am trying to fit a model from TCGA data (RNASeq, Methylation, CNV, Mutations, miRNASeq, RPPA).

When I run the vignette example everything is fine.

When I fit the model (runMOFA) with my data, I get this error:

Error in if (any(r[[i]] > cor_threshold)) { : missing value where TRUE/FALSE needed Warning messages: 1: In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, : integer value -2^63 replaced NA. See the section 'Large integer data types' in the 'rhdf5' vignette for more details. 2: In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, : integer value -2^63 replaced NA. See the section 'Large integer data types' in the 'rhdf5' vignette for more details.

Any idea?

rargelaguet commented 4 years ago

I have never seen this problem, but it is arising when dropping inactive factors. For a quick solution, just set ModelOptions$numFactors to a reasonable number (let's say 15) and TrainOptions$DropFactorThreshold to 0. Then the model will not drop inactive factors while training.

P.S. I have tried MOFA with TCGA data before, but I lacked interesting questions and I never pushed this further. If you want to discuss the results happy to help via Slack channel (link in the main github page).

PietroD commented 4 years ago

Thanks for the quick reply Ricard, I think the first error is solved. It was probably due to -Inf values in one of the matrix. Sorry for opening an issue for such a trivial problem.

I now have another issue, related to methylation data.

Warning: Factor 3 is strongly correlated with the total expression for each sample in meth Such (strong) factors usually appear when count-based assays are not properly normalised by library size.

I used level 3 data from FireBrowse, quantile normalized with champ.norm from ChAMP package and subsetted to top 5k variable probes.

What do you suggest?

Thanks

PS The Slack link on the main page is broken.

rargelaguet commented 4 years ago

It means that Factor 3 is correlated to global differences in DNA methylation. Try correlate Factor 3 values (extract them with getFactors) with the global mean methylation (one value per sample). Here is an example:

Sometimes this is technical due to poor normalisation, but this can also be biological (as shown in the vignette above). Be careful if you are using microarray data, as this generally represents technical variation due to differences in overall fluorescence

P.S. Thanks for reporting, slack link is now fixed