biobakery / MetaPhlAn

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data
http://segatalab.cibio.unitn.it/tools/metaphlan/index.html
MIT License
292 stars 84 forks source link

MetaPhlAn is great but high UNKNOWN percentage (>65%) #90

Closed bitcometz closed 4 years ago

bitcometz commented 4 years ago

hello,

I used MetaPhlAn 3.0 to assign my metagenomics reads to the database using the following commands:

source activate metaphlan3_env metaphlan \ A1.fq1.gz,A1.fq2.gz \ --bowtie2out A1.bowtie2.bz2 \ --nproc 8 --input_type fastq \ --unknown_estimation \ -o A1_profile.txt

For A1_profile.txt : UNKNOWN -1 68.12978 kBacteria 2 31.870219416492652 kBacteria|pBacteroidetes 2|976 10.285925282378447 kBacteria|pFirmicutes 2|1239 8.811251985714032 kBacteria|p__Actinobacteria 2|201174 8.402882929220802 ......

Then I used the same data to run with kraken2 against database of archaea, bacteria, plasmid, viral, and fungi. And get better results for only 23% reads are flagged as unclassified. 22.74 337614 337614 U 0 unclassified 77.26 1146929 140 R 1 root 77.20 1146037 246 R1 131567 cellular organisms 77.10 1144610 3867 D 2 Bacteria 38.92 577770 749 D1 1783272 Terrabacteria group .....

Since MetaPhlAn automatically download the database and the database of kraken2 is bigger, so kraken2 can have more reads assigned.

Could you give me some suggestion to improve the classified rate of MetaPhlAn?

Thanks!!!

fbeghini commented 4 years ago

It is possible to increase the mappability by MetaPhlAn only by adding more species in order to identify species-specific maker genes and the procedure is not as straightforward as it is for adding more reference genomes in the Kraken database.

kescobo commented 4 years ago

@bitcometz It's not necessarily due to the larger database. There's a trade-off between sensitivity (avoiding false negatives) and specificity (avoiding false-positives). Metaphlan errs on the the side of high specificity. Kmer matching like that done by Kraken errs in the other direction. You may pick up more bugs, but you should be less certain about your positive hits.

I'm guessing you're sampling from an environment that isn't super well characterized, so lower specificity might be a worthwhile trade-off to make, it just depends on your research question.

bitcometz commented 4 years ago

Thanks for the help !!!