CVUA-RRW / FooDMe

A reproducible and scalable snakemake workflow for the analysis of DNA metabarcoding experiments, with a special focus on food and feed samples.
https://cvua-rrw.github.io/FooDMe
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

[Request] Exclude specific sequences from the nucleotide database based on SeqID #60

Closed gregdenay closed 1 year ago

gregdenay commented 1 year ago

Is your feature request related to a problem? Please describe. Many sequences in the BLAST database are wrongly annotated and show significant discrepancies with other sequences from the same taxon. This leads difficulties with consensus determination and results being annotated at the genus level or higher.

Describe the solution you'd like Problematic sequences can be identified by careful examination of the results and could be marked as such and be excluded from the results, similarly to what is done for the taxid-blocklist process that is already implemented.

Describe alternatives you've considered the blastn CLI only allows to exclude taxa OR sequences but not both simultaneously. The easiest implemntation would be to filter sequences ID on the BLAST results.

Additional context The better solution would be to provide means to curate the database, e.g. with Spec4ID. This is a somewhat more complicated approach and doesn't exclude filtering specific sequences.

gregdenay commented 1 year ago

Will be added in upcoming v1.6.4