Arcadia-Science / metagenomics

A Nextflow workflow for QC, evaluation, and profiling of metagenomic samples using short- and long-read technologies
MIT License
34 stars 2 forks source link

Adds a nanopore workflow with long read modules added #26

Closed elizabethmcd closed 1 year ago

elizabethmcd commented 1 year ago

This PR adds support for nanopore long reads from the point of entry as a single fastq file per sample (assumes basecalling has already been performed and concatenation to a single fastq file). It starts with adapter removal however the concatenation to a single fastq file is usually done at the adapter removal step, but to keep things consistent it will start with that and then continue on to assembly.

This PR goes through adapter removal with porechop_abi, assembly with flye, mapping with minimap2, and polishing with racon (just one round of polishing for now until I check how many might be necessary). These additions work with nextflow run main.nf -profile test_nanopore,docker --outdir test_nanopore with the validated test dataset of subsetted Nanopore reads that assemble together. If you happen to review the latest PR for the hifi2genome workflow with minimap2 mapping added https://github.com/Arcadia-Science/hifi2genome/pull/6 you will notice this minimap2 subworkflow is almost exactly identical except for changing the output format so it feeds into the necessary format for racon.

Apologies this is a tad long, I didn't fully realize I would need to add the mapping step to make the racon step functional. This may change in the future as I further test out Nanopore polishing programs, as I know Emily likes using medaka which doesn't require the prior mapping explicitly because medaka calls minimap2 internally.

[da/7ee040] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_test.csv) [100%] 1 of 1 ✔
[7c/27e346] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:PORECHOP_ABI (om_T1)                                 [100%] 2 of 2 ✔
[91/218b9f] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:FLYE (om_T1)                                         [100%] 2 of 2 ✔
[0b/2b8a55] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:MINIMAP2_SUBWORKFLOW:MINIMAP2_INDEX (2)              [100%] 2 of 2 ✔
[6c/9c5a1f] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:MINIMAP2_SUBWORKFLOW:MINIMAP2_ALIGN (om_T1)          [100%] 2 of 2 ✔
[e3/2b43e6] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:RACON (om_T1)                                        [100%] 2 of 2 ✔
[a6/d75d7f] process > ARCADIASCIENCE_METAGENOMICS:NANOPORE:CUSTOM_DUMPSOFTWAREVERSIONS (1)                      [100%] 1 of 1 ✔
Completed at: 24-Jan-2023 15:59:26
Duration    : 2m 32s
CPU hours   : 0.1
Succeeded   : 12