Closed b16joski closed 6 years ago
I think this is a useful feature, and one that existed in the previous (pre-nextflow) BACTpipe versions, right?
Can you please describe the overall process for how this should work in a bit more detail? No code or nextflow constructs needed, just a run-through of how the logic is supposed to work with all the different options. We need to present the entire logic before we can discuss how to best implement it into our current Nextflow workflow.
As far as I can tell there are at least two good ways of conditionally modifying the called command in a nextflow process:
Use the capability of Nextflow to execute some Groovy commands prior to launching the prokka process, using a construct something like this:
process prokka {
input:
// excluded for brevity
output:
// excluded for brevity
script:
prokka_reference = ""
if (params.prokka_reference) {
prokka_reference = "--proteins ${params.prokka_reference}"
}
"""
prokka \
--force \
--addgenes \
(etc...) \
${prokka_reference} \
$renamed_contigs
"""
}
If we include a configuration parameter prokka_reference
in the configuration file, and set that parameter to a default value of false
, we will have an automatic way of including the --proteins ${params.prokka_reference}
line in the prokka call if the user specifies it when running BACTpipe, e.g.:
$ nextflow run ctmrbio/BACTpipe -profile ctmrnas --reads 'path/to/my/reads/*_R{1,2}.fastq.gz' --prokka_reference path/to/my_reference_proteins.fasta
Note that I haven't tested any of the code here, these are just some thoughts I had that I wanted to share. Maybe we should call the parameter something like prokka_proteins
instead of prokka_reference
now that I think of it, as it is maybe more familiar to people used to running prokka on their own.
Nice @boulund, we just discussed this in the Skype meeting :)
I was doing some testing on using a customized reference of protein file for annotation by prokka on ctmrnas.
First, I needed some test genomes in .fastq
format and reference protein files .faa
. Here I used H.pylori and copied these from Uppmax to ctmrnas in my test directory;
scp josephk@milou.uppmax.uu.se:/home/josephk/joseph/test_nextflow/*.fastq .
In the main pipeline code, I made some changes to the prokka process in bactpipe.nf executable as per Fredrik suggestion.
process prokka {
tag {sample_id}
publishDir "${params.output_dir}/prokka", mode: 'copy'
input:
set sample_id, file(renamed_contigs) from prokka_channel
output:
set sample_id, file("${sample_id}_prokka") into prokka_out
script:
prokka_reference = ""
if (params.prokka_reference) {
prokka_reference = "--proteins ${params.prokka_reference}"
}
"""
prokka \
--force \
--proteins ${params.prokka_reference} \
--evalue 1e-9 \
--kingdom Bacteria \
--locustag ${sample_id} \
--outdir ${sample_id}_prokka \
--prefix ${sample_id} \
--strain ${sample_id} \
${prokka_reference} \
$renamed_contigs
"""
}
In the configuration file, I set the prokka_reference parameter value to false
as below.
prokka_reference = false
I run the pipeline as follows while specifying a specific reference file to use for annotation by prokka.
nextflow run /home/joseph.kirangwa/BACTpipev2.1/BACTpipe/bactpipe.nf -profile ctmrnas --reads "./*_R{1,2}.fastq" --prokka_reference ./*.faa
The pipeline executed well without errors at this point.
N E X T F L O W ~ version 0.26.4
Launching `/home/joseph.kirangwa/BACTpipev2.1/BACTpipe/bactpipe.nf` [thirsty_euclid] - revision: 4507422ae2
============================================================
BACTpipe
Version 2.1b-dev
Bacterial whole genome analysis pipeline
https://bactpipe.readthedocs.io
============================================================
[warm up] executor > local
[61/bcafea] Submitted process > screen_for_contaminants (2_HP_HPAG1_7-8)
[73/269356] Submitted process > screen_for_contaminants (1_HP_26695)
[6f/d73709] Submitted process > bbduk (2_HP_HPAG1_7-8)
[8c/081b9d] Submitted process > bbduk (1_HP_26695)
[e6/96b0d6] Submitted process > shovill (2_HP_HPAG1_7-8)
[f6/09c737] Submitted process > fastqc (2_HP_HPAG1_7-8)
[b1/a86a76] Submitted process > fastqc (1_HP_26695)
[58/2c3f53] Submitted process > shovill (1_HP_26695)
[99/ed7f6f] Submitted process > prokka (2_HP_HPAG1_7-8)
[5d/8416e4] Submitted process > prokka (1_HP_26695)
[09/6149a5] Submitted process > multiqc
============================================================
BACTpipe workflow completed without errors
Check output files in folder:
BACTpipe_results_test
============================================================
The results were stored in a specified folder as well.
(base) [joseph.kirangwa@ctmr-nas BACTpipe_results_test]$ ls
bbduk fastqc mash.screen multiqc prokka shovill
-Therefore, what is left is to extract the gram output when using assess_mash.py from the respective column, then provide this to prokka during annotation like --gram [X] Gram: -/neg +/pos (default '')
-I should also mention that I did make some change when executing the assess_mash_screen.py
by providing the gram_stain.txt file as follows:
--gram "$baseDir/resources/gram_stain.txt"
specify if wanted by the user for prokka to use specific reference sequences during annotation.