genomic-medicine-sweden / gms-artic

A nextflow pipeline with a GMS touch for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics).
GNU Affero General Public License v3.0
8 stars 6 forks source link

Clean human reads before initialisation #39

Open pbiology opened 2 years ago

pbiology commented 2 years ago

What needs to be done: Before running any computational steps on remote nodes, a cleaning of possible human contaminants should be done.

Suggestions on how to get it done: We could and a pre-stage which uses a local executor (calling a local comp node), which runs cleanup of all fastq files. The question is on which software to do the cleanup. I don't have any benchmarks really, but the two that springs to mind is BBduk form the BBMap package, or kraken2. Would be great to get some input form others here. Any suggestions @talnor @sofstam @JD2112 @bokelund ?

The documentation should probably also refer to some publication showing the effectiveness of such a cleanup

What are the arguments for getting it done: This way we can ensure we've taken precautions to not ending any sensitive genetic information to remote compute nodes.

Task is considered finished when: All fastq files sent to remote nodes are free from human data.

sofstam commented 2 years ago

@pbiology What about bwa-mem?

pbiology commented 2 years ago

@pbiology What about bwa-mem?

Yeah could be. I was just quick searching and had a look at this paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1819-8 Especially figure 1.

But I have by no means done any exhaustive investigation into this

pbiology commented 2 years ago

This also seems interesting: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7478626/

bokelund commented 2 years ago

Jag fick tips från Wolmar om denna: https://gitlab.com/uit-sfb/fhi-desensitize som norska fohm använder/utvecklat