benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

Too many ASVs for downstream analysis, ways to filter out data #1285

Closed tcrispino13 closed 3 years ago

tcrispino13 commented 3 years ago

Hello there! we used dada2 to generate the ASVs for our study on soil microbiome and we were able to identify 119,161 ASVs after prevalence filtering, removal of chloroplast and mitochondrial seqeunces, we still have 41,181 ASVs to work with. Can I still filter this? I tried using the taxo_glom in one of the workflow that is published but it drastically cut down the ASVs into 345. What should I do?

Thanks for your assistance!

benjjneb commented 3 years ago

Yes it is valid to filter further, as long as the filtering you are doing is not aware of any subsequent inferential analysis you might be doing. See this paper for more rigorous justification: Independent filtering increases detection power for high-throughput experiments

The ideal way to further filter down your feature set (i.e. number of taxa) depends a bit on the next questions you want to task. If for example differential abundance testing is important, I would start by fitering out taxa present in few samples or at very low abundance, as even if those taxa were associated with the condition of interest, they wouldn't meet any relevant statistical threshold anyway.