NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 15 forks source link

Some reference genomes are assigned to several taxIDs which generate noise at the mapDamage part #141

Closed ZoePochon closed 1 month ago

ZoePochon commented 7 months ago

We have noticed that several reference genomes can be assigned to several taxIDs somehow. This was noticed because the mapDamage profile of several microbes look too similar to be true.

To test this, I did:

samtools view 1063.bam > 1063_reads samtools view 1076.bam > 1076_reads cut -f 1 1063_reads > 1063_reads_name cut -f 1 1076_reads > 1076_reads_name while read line; do grep -w $line 1076_reads_name; done < 1063_reads_name

 And if you look for one read, you can find it in several taxid.bam files.

samtools view 1063.bam | grep -w SRR13309609.7577690.1 SRR13309609.7577690.1 16 AL133266.11 1278 40 79M 0 0 GTTTAAAATTTTTTCTTAGATGTTATTGTTGAAAAGAGCTAAAAATGGCCTGAGTCATTTCCTTCTGCAGGCGCACACT FFDFFGFFFFF@FFFFFFFFFEFFFFFDFFFFFFFFFEFGFFFFFFFF?FFFFFFFFFFFFFEDFFEFFFFFEFFF8 AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:57C14A6 YT:Z:UU samtools view 1076.bam | grep -w SRR13309609.7577690.1 SRR13309609.7577690.1 16 AL133266.11 1278 40 79M 0 0 GTTTAAAATTTTTTCTTAGATGTTATTGTTGAAAAGAGCTAAAAATGGCCTGAGTCATTTCCTTCTGCAGGCGCACACT FFDFFGFFFFF@FFFFFFFFFEFFFFFDFFFFFFFFFEFGFFFFFFFF?FFFFFFFFFFFFFEDFFEFFFFFEFFF8 AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:57C14A6 YT:Z:UU samtools view 1505.bam | grep -w SRR13309609.7577690.1 SRR13309609.7577690.1 16 AL133266.11 1278 40 79M 0 0 GTTTAAAATTTTTTCTTAGATGTTATTGTTGAAAAGAGCTAAAAATGGCCTGAGTCATTTCCTTCTGCAGGCGCACACT FFDFFGFFFFF@FFFFFFFFFEFFFFFDFFFFFFFFFEFGFFFFFFFF?FFFFFFFFFFFFFEDFFEFFFFFEFFF8 AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:57C14A6 YT:Z:UU samtools view 179636.bam | grep -w SRR13309609.7577690.1 SRR13309609.7577690.1 16 AL133266.11 1278 40 79M 0 0 GTTTAAAATTTTTTCTTAGATGTTATTGTTGAAAAGAGCTAAAAATGGCCTGAGTCATTTCCTTCTGCAGGCGCACACT FFDFFGFFFFF@FFFFFFFFFEFFFFFDFFFFFFFFFEFGFFFFFFFF?FFFFFFFFFFFFFEDFFEFFFFFEFFF8 AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:57C14A6 YT:Z:UU

LeandroRitter commented 1 month ago

@ZoePochon yes, this is an expected behavior in the lack of LCA which is the case for Bowtie2 alignments. Indeed, I can imagine it can cause inaccuracies in authentication, this is why we stick to Malt alignments (+MaltExtract) in the end for the final conclusion. I do not think there is something we can do here. I only can recommend (and plan) to apply sam2lca to the Bowtie2 alignments which potentially can allow us to skip the whole Malt thing completely. I will close it for now as this is a design problem of aMeta rather than a bug but feel free to reopen