Reproduce a figure of your article with my own data (not simulated)

magibc commented 2 years ago

Hi @FrederickHuangLin,

With my microbiome data I would like to reproduce the figure 3 of your article: Analysis of microbial compositions: a review of normalization and differential abundance analysis, in order to help me to decide the best normalization technique.

In your manuscript you use a simulation data, but I don't know how to obtain the d_true variable that is used in the abovementioned figure in the following script: https://github.com/FrederickHuangLin/Microbiome-Review-Code-Archive/blob/master/scripts/sim.Rmd

The above script to calculate d_true uses the output of the script named as: data_generation.R: https://github.com/FrederickHuangLin/Microbiome-Review-Code-Archive/blob/master/scripts/data_generation.R

I tried to modify data_generation.R script without simulated data (with my own data) to be able to calculate d_true value but I'm not capable.

I can calculate from my data all the rest normalized values for ANCOM-BC, TSS, CSS...only remains how to calculate d_true and try to understand they meaning.

Thanks on advance for your help,

maggie

FrederickHuangLin commented 2 years ago

Hi Maggie,

The d_true value you mentioned is the log sampling fraction, which can be seen as the log of library size divided by the microbial load. You can see the corresponding code in line 142 here.

However, in practice, the microbial load is not observable. You can see this point in our paper, or this paper. That means we are not able to measure the true sampling fraction using real data. The methods you listed (including ANCOM-BC) aim to estimate this parameter, but we can only use simulation studies to benchmark their performances.

Best, Huang

magibc commented 2 years ago

Many thanks Huang! Solved and I understood the issue!

FrederickHuangLin / Microbiome-Review-Code-Archive

Reproduce a figure of your article with my own data (not simulated) #2