biocore / qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
GNU General Public License v2.0
285 stars 268 forks source link

filter_otus_from_otu_table.py needs to account for different numbers of reads per sample #1029

Open alk224 opened 11 years ago

alk224 commented 11 years ago

Right now when filtering otu tables to remove artifactual sequences, you can filter based on a global minimum count or percentage. This is problematic for most datasets which have different numbers of reads per sample since if filter at -n 100, it will be much more aggressive for a sample with just 800 reads versus one that has 10,000. If I expect a certain error rate from an Illumina run, say 0.1%, and I wish to filter at this level, I need a feature that will filter each sample based on this percentage (0.8 from the sample with 800 reads, 10 from the sample with 10,000 reads).

alk224 commented 9 years ago

@adamrp wrote something that did the trick last year (https://gist.github.com/adamrp/7591573) but it still has not been added to qiime functionality. I pointed someone else requesting this feature on the forum to this script recently, so I think there continues to be interest. The script as is requires packages that are not present in the new biom and also only interprets json tables and can't do hdf5.