bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Be less stringent when filtering RDP results #67

Closed loic-couderc closed 5 years ago

loic-couderc commented 5 years ago

Currently, in MATAM_V1.5.2, the following filter is applied on RDP results: when the RDP confidence threshold (at genus level) is under a threshold of 0.8, then tag all taxonomic nodes as "unclassified".

Exemple:

# before
3015            Bacteria        domain  1.0     "Proteobacteria"        phylum  1.0     Alphaproteobacteria     class   1.0     Rhodospirillales        order   0.9     Rhodospirillaceae       family  0.62    Rhodovibrio     genus   0.27

# after
3015    unclassified    domain  1.0     unclassified    phylum  1.0     unclassified    class   1.0     unclassified    order   0.9     unclassified    family  0.62    unclassified    genus   0.27

But doing this is a potential loss of informations. It will make more sense to replace only the taxonomic levels under the given threshold.

Exemple:

# before
3015            Bacteria        domain  1.0     "Proteobacteria"        phylum  1.0     Alphaproteobacteria     class   1.0     Rhodospirillales        order   0.9     Rhodospirillaceae       family  0.62    Rhodovibrio     genus   0.27

# after
3015    Bacteria        domain  1.0     Proteobacteria  phylum  1.0     Alphaproteobacteria     class   1.0     Rhodospirillales        order   0.9     unclassified    family  0.62    unclassified    genus   0.27
PavlaDe commented 5 years ago

Hello,

I ran MATAM on my metagenomes usind the -v --perform_taxonomic_assignment. And it ran without error but when I looked at the krona plots with 70% unclassified which is due to the genus problematic.

How do I implement the adjusted enhancement to rdp.py in order to get the updated and more accurate krona files? I do not want to rerun the assembly steps as these seem to be fine.

Thanks,

Pavla

loic-couderc commented 5 years ago

Hi @PavlaDe,

At the moment, the new filtering method is available in develop branch. I will release a new version of MATAM with this modification in the next days. Then you will be able to re-run from the "abundance_calculation" step of MATAM with: -v --perform_taxonomic_assignment --resume_from abundance_calculation

loic-couderc commented 5 years ago

The new MATAM v1.5.3 is now available as a conda package.