Closed hpf0815 closed 1 year ago
Hey there,
Thank-you for using the workflow! Apologies for the delay in getting back to you.
At first blush this looks like your supplied reference override file includes a reference name that is not present in the ref2taxid.targloci.tsv. This is an easy mistake to make, and we should make a note to document this behaviour better.
The best way to solve this would be to create a ref2taxid mapping tsv for your specific reference file, see the included one as an example, or I can provide more detailed instructions if that helps.
Hi Karhide,
thanks for your response. I am not that experienced in data evaluation, so some more details would be highly appreciated :)
Where can I find the included ref2taxid mapping tsv, so that I can have a look on how it is set-up, or can you provide me some instructions on how to generate one for my reference file?
Hi, Thank you for using the workflow and apologies for the late answer. The ref2taxid tsv is a file that contains the id of the sequence from your reference and the corresponding taxid using tabs as separators.
The default ref2taxid file can be downloaded using:
wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ref2taxid.targloci.tsv
And it looks like:
head ref2taxid.targloci.tsv
NG_012432.1 162394
NG_013120.1 4952
NG_013131.1 224370
NG_013144.1 108569
NG_013153.1 27332
For example: NG_012432.1 is the id of the reference and 162394 is the taxid according to the NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/)
The references in this file must be the ones that are in the reference that you're using.
Hi, Could you confirm if this issue has been solved? We'll close this ticket on the assumption things are now resolved.
Hi,
I am currently trying a few data analysis options to detect viral contaminations in my sample. Therefore, I also wanted to test the metagenomics workflow, and also compare the minimap2 and the kraken2 option. When I run the minimap2 option, the run just stops at the minimap2 step. This is the overall ouput I receive:
nextflow run epi2me-labs/wf-metagenomics -r v1.1.4 --fastq all.fastq --out_dir nf_minimap_test/ --minimap2 --reference RefSeq_viral.fna --kraken2 FALSE --threads 20 N E X T F L O W ~ version 22.04.5 NOTE: Your local project version looks outdated - a different revision is available in the remote repository [f3e7568d46] Launching
https://github.com/epi2me-labs/wf-metagenomics
[determined_hugle] DSL2 - revision: a0bd9ca590 [v1.1.4] Core Nextflow options revision : v1.1.4 runName : determined_hugle containerEngine: docker launchDir : /home/hans-peter/Test_Area workDir : /home/hans-peter/Test_Area/work projectDir : /home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics userName : hans-peter profile : standard configFiles : /home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/nextflow.configCore options fastq : all.fastq out_dir : nf_minimap_test/ sources : [ncbi_16s_18s:[reference:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_16s_18s.fna, refindex:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_16s_18s.fna.fai, database:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_kraken2.tar.gz, kmer_dist:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/database1000mers.kmer_distrib, ref2taxid:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ref2taxid.targloci.tsv, taxonomy:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz], ncbi_16s_18s_28s_ITS:[reference:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS.fna, refindex:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS.fna.fai, database:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS_kraken2.tar.gz, kmer_dist:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/database1000mers.kmer_distrib, ref2taxid:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ref2taxid.ncbi_16s_18s_28s_ITS.tsv, taxonomy:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz], PlusPF-8:[database:https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_8gb_20210517.tar.gz, taxonomy:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz]]
Minimap2 options minimap2 : true reference : RefSeq_viral.fna
Kraken2 options kraken2 : false
Generic options threads : 20
!! Only displaying parameters that differ from the pipeline defaults !!
If you use wf-metagenomics for your analysis please cite:
Checking inputs. Checking custom reference exists Checking custom reference index exists Checking fastq input. Single file input detected. executor > local (6) [52/9ff427] process > handleSingleFile (1) [100%] 1 of 1 ✔ [97/7a3645] process > pipeline:unpackTaxonomy [100%] 1 of 1 ✔ [a4/795cdf] process > pipeline:combineFilterFastq (1) [100%] 1 of 1 ✔ [dc/c50135] process > pipeline:minimap2 (1) [ 0%] 0 of 1 [47/eec279] process > pipeline:getVersions [100%] 1 of 1 ✔ [f0/73726d] process > pipeline:getParams [100%] 1 of 1 ✔ [- ] process > pipeline:makeReport - [- ] process > output - Error executing process > 'pipeline:minimap2 (1)'
Caused by: Process
pipeline:minimap2 (1)
terminated with an error exit status (1)Command executed:
executor > local (6) [52/9ff427] process > handleSingleFile (1) [100%] 1 of 1 ✔ [97/7a3645] process > pipeline:unpackTaxonomy [100%] 1 of 1 ✔ [a4/795cdf] process > pipeline:combineFilterFastq (1) [100%] 1 of 1 ✔ [dc/c50135] process > pipeline:minimap2 (1) [100%] 1 of 1, failed: 1 ✘ [47/eec279] process > pipeline:getVersions [100%] 1 of 1 ✔ [f0/73726d] process > pipeline:getParams [100%] 1 of 1 ✔ [- ] process > pipeline:makeReport - [- ] process > output [ 0%] 0 of 2 Error executing process > 'pipeline:minimap2 (1)'
Caused by: Process
pipeline:minimap2 (1)
terminated with an error exit status (1)Command executed:
minimap2 -t 20 -ax map-ont RefSeq_viral.fna all.fastq | samtools view -h -F 2304 - | format_minimap2.py - -o all.minimap2.assignments.tsv -r ref2taxid.targloci.tsv | samtools sort -o all.bam - samtools index all.bam awk -F '\t' '{print $3}' all.minimap2.assignments.tsv > taxids.tmp taxonkit --data-dir taxonomy_dir lineage -R taxids.tmp | aggregate_lineages.py -p all.minimap2
Command exit status: 1
Command output: (empty)
Command error: [M::mm_idx_gen::18.2250.83] collected minimizers [M::mm_idx_gen::43.7980.91] sorted minimizers [M::main::43.8070.91] loaded/built the index for 14813 target sequence(s) [M::mm_mapopt_update::44.4230.91] mid_occ = 92 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 14813 [M::mm_idx_stat::44.723*0.91] distinct minimizers: 40131811 (62.74% are singletons); average occurrences: 2.185; average spacing: 5.355; total length: 469589088 Traceback (most recent call last): File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'NC_018464.1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/format_minimap2.py", line 82, in
execute(sys.argv[1:])
File "/home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/format_minimap2.py", line 74, in execute
main(
File "/home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/format_minimap2.py", line 30, in main
taxid = ref2taxid_df.at[aln.reference_name, 'taxid']
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 2270, in getitem
return super().getitem(key)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 2221, in getitem
return self.obj._get_value(*key, takeable=self._takeable)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/frame.py", line 3622, in _get_value
row = self.index.get_loc(index)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'NC_018464.1'
Work dir: /home/hans-peter/Test_Area/work/dc/c50135909fb31f19693d617d7f1bdb
Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command linehelp would be greatly appreciated :)
thanks and best regards, Hans-Peter