NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 15 forks source link

Option to choose between filtering by TaxReads or Reads #111

Closed ZoePochon closed 1 year ago

ZoePochon commented 1 year ago

We are currently filtering by TaxReads, and most of the time Reads and TaxReads can have quite similar values. But sometimes, a lot of reads are assigned more specifically to a lower level called "sequence" in KrakenUniq. It is a lot the case with viruses where the genotype can be quite different than the complete reference genomes used as a reference at the "species" level. We would need to create an option where we can choose to filter by Reads instead of TaxReads. In the end that filtering is necessary to have enough reads aligned for the deamination pattern but this can be the case if we align to the right reference genome by hand later on.

Here is an example (without the real virus name, sorry): Percentage Reads TaxReads Kmers Dup Coverage Taxid Taxon Name 9.799e-05 522 64 1339 5.68 0.00129 10376 species Mysterious virus 7.79e-05 415 415 30 151 0.04601 1004844516 sequence Mysterious virus isolate 2, partial genome 4.693e-06 25 25 29 13.8 0.03448 1004844718 sequence Mysterious virus isolate 3, partial genome 1.502e-06 8 8 32 6.69 0.08312 1004842593 sequence Mysterious virus, cell line 5 1.126e-06 6 6 16 5.25 0.1103 1004844670 sequence Mysterious virus isolate 4, partial genome 3.754e-07 2 2 31 1.55 0.08052 1004844691 sequence Mysterious virus isolate 5, partial genome 3.754e-07 2 2 21 2 0.01246 1004844628 sequence Mysterious virus isolate 6, partial genome

ZoePochon commented 1 year ago

So I've been trying to filter by Reads for some sample to see what happens but it becomes way to sensitive then and I have to increase the threshold crazily. Trying to find another workaround

ZoePochon commented 1 year ago

Checking the KrakenUniq output for viruses of interest and investigate them deeper by hand is the only solution I see so far.