bioFAM / MOFA

Multi-Omics Factor Analysis
GNU Lesser General Public License v3.0
231 stars 57 forks source link

Different data processing on metabolomics, I get different R2. #60

Open Chenjiani1112 opened 3 years ago

Chenjiani1112 commented 3 years ago

Hi. I have three multi-omics datasets of RNA seq (vst normalization), DNA methylation (beta value) and plasma metabolomics. I normalized my metabolite data with the total sum of all detected ions and deleted unstable metabolite using QC, and deleted the outliers based on these retrained metabolites using IQR, then I normalized samples by median and normalized these plasma metabolite using pareto scaling. Finally, I used my RNA seq, DNA methylation and plasma metabolites as input data to run MOFA. Howerver, the results showed that all latent factors can explain about 0% variance in plasma metabolomics. Then, I transformed my plasma mteabolite data using log transform and normalized by pareto scaling. This MOFA result( plasma metabolites with log)showed a dramatic difference compared with the prior MOFA resul t( plasma metabolites without log transform), that is all latent factors can explain about 10% variance in plasma metabolomics.

I am confused about the data input on metabolomics. Thanks.

rargelaguet commented 3 years ago

Hi @Chenjiani1112 , you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Chenjiani1112 commented 3 years ago

Hi @Chenjiani1112 , you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi @Chenjiani1112 , you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Thanks for your help!

Chenjiani1112 commented 3 years ago

Hi @Chenjiani1112 , you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. Thanks for sovling my doubts. Now, I have another problem. When I transformed my metabolomics data by log transform, a number of data <0 were produced. I think this situation would exert great influence on my MOFA result.

Thanks

nvall commented 3 years ago

Hi @Chenjiani1112, This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Chenjiani1112 commented 3 years ago

Hi

Hi @Chenjiani1112, This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Thanks!

Chenjiani1112 commented 3 years ago

Hi @Chenjiani1112 , you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. @rargelaguet Thanks for helping me resolve my prior confusions. I have appreciated your published article about MOFA and your MOFA-related documents/tuorials. However, now I have another doubt when running MOFA. As I mentioned earlier, I have three multi-omics datasets of RNA-seq, DNA methylation and plasma metabolomics, I know you used vst data for RNA-seq data and M value for DNA methylation. Now, I want to use log2FPKM data for RNA-seq data; beta value data for DNA methylation; quantile normed, log2 transformed and pareto scaling data for plasma metabolomics. due to my research design. I want to know can I use log2FPKM for RNA-seq data as input data to run MOFA? This is my confusion. Meanwhile, I found that log normalised RNA-seq data or M-values of bulk methylation data was recommended in your MOFA tuorials.

Looking forward to your reply. Thanks!

Best, Chen.

rargelaguet commented 3 years ago

Hi Chen, the important requirement for MOFA is that the data needs to be continuous. Also, the closer it looks to a gaussian distribution the better, but this is not necessary. Can you attach here a histogram of your matrices before and after normalisation? Then it will be easier to provide guidance