epi2me-labs / wf-metagenomics

Metagenomic classification of long-read sequencing data
Other
49 stars 23 forks source link

Process `pipeline:minimap2 (1)` terminated with an error exit status (1) #11

Closed hpf0815 closed 1 year ago

hpf0815 commented 2 years ago

Hi,

I am currently trying a few data analysis options to detect viral contaminations in my sample. Therefore, I also wanted to test the metagenomics workflow, and also compare the minimap2 and the kraken2 option. When I run the minimap2 option, the run just stops at the minimap2 step. This is the overall ouput I receive:

nextflow run epi2me-labs/wf-metagenomics -r v1.1.4 --fastq all.fastq --out_dir nf_minimap_test/ --minimap2 --reference RefSeq_viral.fna --kraken2 FALSE --threads 20 N E X T F L O W ~ version 22.04.5 NOTE: Your local project version looks outdated - a different revision is available in the remote repository [f3e7568d46] Launching https://github.com/epi2me-labs/wf-metagenomics [determined_hugle] DSL2 - revision: a0bd9ca590 [v1.1.4] Core Nextflow options revision : v1.1.4 runName : determined_hugle containerEngine: docker launchDir : /home/hans-peter/Test_Area workDir : /home/hans-peter/Test_Area/work projectDir : /home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics userName : hans-peter profile : standard configFiles : /home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/nextflow.config

Core options fastq : all.fastq out_dir : nf_minimap_test/ sources : [ncbi_16s_18s:[reference:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_16s_18s.fna, refindex:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_16s_18s.fna.fai, database:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_kraken2.tar.gz, kmer_dist:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/database1000mers.kmer_distrib, ref2taxid:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ref2taxid.targloci.tsv, taxonomy:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz], ncbi_16s_18s_28s_ITS:[reference:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS.fna, refindex:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS.fna.fai, database:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS_kraken2.tar.gz, kmer_dist:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/database1000mers.kmer_distrib, ref2taxid:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ref2taxid.ncbi_16s_18s_28s_ITS.tsv, taxonomy:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz], PlusPF-8:[database:https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_8gb_20210517.tar.gz, taxonomy:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz]]

Minimap2 options minimap2 : true reference : RefSeq_viral.fna

Kraken2 options kraken2 : false

Generic options threads : 20

!! Only displaying parameters that differ from the pipeline defaults !!

If you use wf-metagenomics for your analysis please cite:

Checking inputs. Checking custom reference exists Checking custom reference index exists Checking fastq input. Single file input detected. executor > local (6) [52/9ff427] process > handleSingleFile (1) [100%] 1 of 1 ✔ [97/7a3645] process > pipeline:unpackTaxonomy [100%] 1 of 1 ✔ [a4/795cdf] process > pipeline:combineFilterFastq (1) [100%] 1 of 1 ✔ [dc/c50135] process > pipeline:minimap2 (1) [ 0%] 0 of 1 [47/eec279] process > pipeline:getVersions [100%] 1 of 1 ✔ [f0/73726d] process > pipeline:getParams [100%] 1 of 1 ✔ [- ] process > pipeline:makeReport - [- ] process > output - Error executing process > 'pipeline:minimap2 (1)'

Caused by: Process pipeline:minimap2 (1) terminated with an error exit status (1)

Command executed:

executor > local (6) [52/9ff427] process > handleSingleFile (1) [100%] 1 of 1 ✔ [97/7a3645] process > pipeline:unpackTaxonomy [100%] 1 of 1 ✔ [a4/795cdf] process > pipeline:combineFilterFastq (1) [100%] 1 of 1 ✔ [dc/c50135] process > pipeline:minimap2 (1) [100%] 1 of 1, failed: 1 ✘ [47/eec279] process > pipeline:getVersions [100%] 1 of 1 ✔ [f0/73726d] process > pipeline:getParams [100%] 1 of 1 ✔ [- ] process > pipeline:makeReport - [- ] process > output [ 0%] 0 of 2 Error executing process > 'pipeline:minimap2 (1)'

Caused by: Process pipeline:minimap2 (1) terminated with an error exit status (1)

Command executed:

minimap2 -t 20 -ax map-ont RefSeq_viral.fna all.fastq | samtools view -h -F 2304 - | format_minimap2.py - -o all.minimap2.assignments.tsv -r ref2taxid.targloci.tsv | samtools sort -o all.bam - samtools index all.bam awk -F '\t' '{print $3}' all.minimap2.assignments.tsv > taxids.tmp taxonkit --data-dir taxonomy_dir lineage -R taxids.tmp | aggregate_lineages.py -p all.minimap2

Command exit status: 1

Command output: (empty)

Command error: [M::mm_idx_gen::18.2250.83] collected minimizers [M::mm_idx_gen::43.7980.91] sorted minimizers [M::main::43.8070.91] loaded/built the index for 14813 target sequence(s) [M::mm_mapopt_update::44.4230.91] mid_occ = 92 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 14813 [M::mm_idx_stat::44.723*0.91] distinct minimizers: 40131811 (62.74% are singletons); average occurrences: 2.185; average spacing: 5.355; total length: 469589088 Traceback (most recent call last): File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'NC_018464.1'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/format_minimap2.py", line 82, in execute(sys.argv[1:]) File "/home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/format_minimap2.py", line 74, in execute main( File "/home/hans-peter/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/format_minimap2.py", line 30, in main taxid = ref2taxid_df.at[aln.reference_name, 'taxid'] File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 2270, in getitem return super().getitem(key) File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 2221, in getitem return self.obj._get_value(*key, takeable=self._takeable) File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/frame.py", line 3622, in _get_value row = self.index.get_loc(index) File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'NC_018464.1'

Work dir: /home/hans-peter/Test_Area/work/dc/c50135909fb31f19693d617d7f1bdb

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

help would be greatly appreciated :)

thanks and best regards, Hans-Peter

Karhide commented 2 years ago

Hey there,

Thank-you for using the workflow! Apologies for the delay in getting back to you.

At first blush this looks like your supplied reference override file includes a reference name that is not present in the ref2taxid.targloci.tsv. This is an easy mistake to make, and we should make a note to document this behaviour better.

The best way to solve this would be to create a ref2taxid mapping tsv for your specific reference file, see the included one as an example, or I can provide more detailed instructions if that helps.

hpf0815 commented 2 years ago

Hi Karhide,

thanks for your response. I am not that experienced in data evaluation, so some more details would be highly appreciated :)

Where can I find the included ref2taxid mapping tsv, so that I can have a look on how it is set-up, or can you provide me some instructions on how to generate one for my reference file?

nggvs commented 1 year ago

Hi, Thank you for using the workflow and apologies for the late answer. The ref2taxid tsv is a file that contains the id of the sequence from your reference and the corresponding taxid using tabs as separators.

The default ref2taxid file can be downloaded using:

wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ref2taxid.targloci.tsv

And it looks like:

head ref2taxid.targloci.tsv 
NG_012432.1     162394
NG_013120.1     4952
NG_013131.1     224370
NG_013144.1     108569
NG_013153.1     27332

For example: NG_012432.1 is the id of the reference and 162394 is the taxid according to the NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/)

The references in this file must be the ones that are in the reference that you're using.

nggvs commented 1 year ago

Hi, Could you confirm if this issue has been solved? We'll close this ticket on the assumption things are now resolved.