benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
144 stars 24 forks source link

How low is my "low biomass"? #17

Closed sarasary82 closed 6 years ago

sarasary82 commented 6 years ago

Hi, thanks for creating this amazing tool for decontamination We are working with low biomass samples, and we have some questions: 1) in our case, the real features in our library will be probably very similar to contaminants, that' s why we thought of filtering at a feature level instead of a bacteria level, what do you think? 2) why did u go for a chi-square test in the decontam method? 3) in my samples I have a concentration of ~ 1000 to 4000 bacteria (by qPCR); how "low" you consider such a sample? and which approach would you recommend in order to be super sure what we define in the composition is real (Isnotcontam, Iscontam..)? 4) is the supplemental information from the decontam paper already available? Thanks again, Sara

benjjneb commented 6 years ago

On (1) we agree that filtering out contaminants is best done at the most resolved level for the reason you point out. So ASVs in marker-gene data, or species/strain level in metagenomics data.

On (2), Fisher's exact test is run instead of the chi-square if there are too few counts for the chi-square approximation. If you only have a handful of negative controls, Fisher's exact will be used.

On (3), that is hard to answer, 1000-4000 bacteria per what volume? Conceptually, "extremely low" biomass samples are those in which the concentration of DNA from the sample is as low or lower than the concentration of DNA from contaminants, and in that case you should use isNotContaminant. We don't have an easy diagnostic for that right now, we should think about that. My first suggestion would be to use isContaminant, then plot a histogram of the $p$ column. Is it bimodal with peaks near 1 and near 0? That is what is expected so if you see that it is probably working fine.

On (4), the supplementary information is just appended to the end of the preprint PDF.