Closed Matth-Cbn closed 1 year ago
Hi! Thank you for using the workflow and also for providing the parameters. It's really useful to have that information. I suspect that the taxonomy database (which contains for each reference name, the lineage information) doesn't match the reference database (which contains the reference name and the sequence). I'll check this just to be sure that was happening here.
Hi! You know how I can fix the problem if it is possible ? Thank you for your feedback and future feedback if you discover more.
Hi,
You should provide a tab-separated file through the flag --ref2taxid
which contains the reference of the sequence and the taxid.
For example, if you have the reference:
AYKI01000027.110818.112349
in your silva_138.fna,
in the ref2taxid file should appear:
AYKI01000027.110818.112349 1352943
You can download the taxid from silva webpage, but please take into account that they use different taxid than NCBI. So there are two different options: use the file taxmap_embl-ebi_ena_ssu_ref_138.1.txt to extract NCBI taxid or if you use silva taxid, use the --taxonomy
to provide a NCBI-style taxdump files for custom taxonomy suitable for your custom database.
Please let us know if the problem is still not solved.
Thank you for your answer. First, I would like to clarify that we are launching the metagenomic pipeline from the application (not from the command line). Then we looked at your recommendations. We saw the KeyError message: 'JN578465.1.1478' in the results I sent you in the first message. When we look at what corresponds to JN578465.1.1478 in the base silva, we actually get the phylogeny of a bacterium: Streptococcus anginosus When we compare the file seqid2ncbitaxid.tsv we also get the taxid number of this bacteria. Taxonomic references are therefore also in our database. I will join you at the command terminal which allowed us to verify this so that you can see it (terminal capture) So we don’t really know what to do about.
Hi, I'll try to reproduce it, I may miss something.
Just to be sure, which options are you using in the app? From your logfile I see that you're using your own reference, but I don't see that you're using the --Ref2taxid
input option (in the App it is in the minimap2 options, ref2taxid and you would have to point to the file seqid2ncbitaxid.tsv
) and in the log should appear as "--ref2taxid". If it is not provided, the wf uses the default one, which does not match the silva references.
fastq: /media/stage/CL1/Stage/GD Biotech/data/Barcode01,
classifier: minimap2,
analyse_unclassified: true,
database_set: ncbi_16s_18s,
store_dir: store_dir,
reference: /media/stage/CL1/Stage/GD Biotech/Database/silva_138.fna,
bracken_level: S,
port: 8080,
host: localhost,
out_dir: /home/stage/epi2melabs/instances/wf-metagenomics_18f085b8-5883-4bb2-a686-3870d380eb3d/output,
min_len: 200,
max_len: 2000,
threads: 4,
server_threads: 8,
kraken_clients: 2,
wf: {
agent: epi2melabs/5.0.2
}
}
I use the reference options, parameters references for the silva database and the minimap2 options with the ref2taxid I had already tried this option, I just did it again with ref2taxid but the error persists
Core Nextflow options
runName : Test_minimap2_silva138
containerEngine: docker
launchDir : /home/stage/epi2melabs/instances/wf-metagenomics_37c5bddf-85ba-412e-b360-b21db70a9edf
workDir : /home/stage/epi2melabs/instances/wf-metagenomics_37c5bddf-85ba-412e-b360-b21db70a9edf/work
projectDir : /home/stage/epi2melabs/workflows/epi2me-labs/wf-metagenomics
userName : stage
profile : standard
configFiles : /home/stage/epi2melabs/workflows/epi2me-labs/wf-metagenomics/nextflow.config
Input Options
fastq : /media/stage/CL1/Stage/GD Biotech/data/Barcode01
classifier : minimap2
Reference Options
reference : /media/stage/CL1/Stage/GD Biotech/Database/silva_138.fna
database_sets : [ncbi_16s_18s:[reference:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_16s_18s.fna, refindex:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_16s_18s.fna.fai, database:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ncbi_targeted_loci_kraken2.tar.gz, kmer_dist:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/database1000mers.kmer_distrib, ref2taxid:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s/ref2taxid.targloci.tsv, taxonomy:https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp_2023-01-01.zip], ncbi_16s_18s_28s_ITS:[reference:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS.fna, refindex:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS.fna.fai, database:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ncbi_16s_18s_28s_ITS_kraken2.tar.gz, kmer_dist:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/database1000mers.kmer_distrib, ref2taxid:https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-metagenomics/ncbi_16s_18s_28s_ITS/ref2taxid.ncbi_16s_18s_28s_ITS.tsv, taxonomy:https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp_2023-01-01.zip], PlusPF-8:[database:https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_08gb_20230314.tar.gz, taxonomy:https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/new_taxdump_2023-03-01.zip], PlusPFP-8:[database:https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_08gb_20230314.tar.gz, taxonomy:https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/new_taxdump_2023-03-01.zip]]
Minimap2 Options
ref2taxid : /media/stage/CL1/Stage/GD Biotech/Database/seqid2ncbitaxid.tsv
Output Options
out_dir : /home/stage/epi2melabs/instances/wf-metagenomics_37c5bddf-85ba-412e-b360-b21db70a9edf/output
Other parameters
process_label : wfmetagenomics
!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-metagenomics for your analysis please cite:
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
--------------------------------------------------------------------------------
This is epi2me-labs/wf-metagenomics v2.2.1.
--------------------------------------------------------------------------------
Checking inputs.
Checking custom reference exists
Checking custom reference index exists
Checking custom ref2taxid mapping exists
Checking fastq input.
[1f/0f955e] Submitted process > minimap_pipeline:getParams
[13/82e9f5] Submitted process > minimap_pipeline:getVersions
[40/d7ff58] Submitted process > fastcat (1)
[1b/0a5c53] Submitted process > minimap_pipeline:output (1)
Staging foreign file: https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp_2023-01-01.zip
[f5/b9724f] Submitted process > minimap_pipeline:output (2)
[26/db1871] Submitted process > minimap_pipeline:unpackTaxonomy
[c2/715f89] Submitted process > minimap_pipeline:minimap (1)
[cd/5345e8] Submitted process > minimap_pipeline:makeReport (1)
ERROR ~ Error executing process > 'minimap_pipeline:makeReport (1)'
Caused by:
Process `minimap_pipeline:makeReport (1)` terminated with an error exit status (1)
Command executed:
workflow-glue report wf-metagenomics-report.html --versions versions --params params.json --stats per-read-stats.tsv --lineages lineages --pipeline "minimap"
Command exit status:
1
Command output:
(empty)
Command error:
[14:29:38 - workflow_glue] Starting entrypoint.
[14:29:39 - Plotter ] Cannot correct axis labels in complicated scenarios.
[14:29:39 - Plotter ] Cannot correct axis labels in complicated scenarios.
[14:29:39 - Plotter ] Cannot correct axis labels in complicated scenarios.
[14:29:39 - Plotter ] Cannot correct axis labels in complicated scenarios.
[14:29:39 - Plotter ] Cannot correct axis labels in complicated scenarios.
Traceback (most recent call last):
File "/home/stage/epi2melabs/workflows/epi2me-labs/wf-metagenomics/bin/workflow-glue", line 7, in <module>
cli()
File "/home/stage/epi2melabs/workflows/epi2me-labs/wf-metagenomics/bin/workflow_glue/__init__.py", line 62, in cli
args.func(args)
File "/home/stage/epi2melabs/workflows/epi2me-labs/wf-metagenomics/bin/workflow_glue/report.py", line 119, in main
plt = ezc.barplot(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/ezcharts/plots/categorical.py", line 67, in barplot
data = data.pivot(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/frame.py", line 8424, in pivot
return pivot(self, index=index, columns=columns, values=values)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/reshape/pivot.py", line 557, in pivot
result = indexed.unstack(columns_listlike) # type: ignore[arg-type]
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/series.py", line 4309, in unstack
return unstack(self, level, fill_value)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 488, in unstack
unstacker = _Unstacker(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 136, in __init__
self._make_selectors()
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 188, in _make_selectors
raise ValueError("Index contains duplicate entries, cannot reshape")
ValueError: Index contains duplicate entries, cannot reshape
Work dir:
/home/stage/epi2melabs/instances/wf-metagenomics_37c5bddf-85ba-412e-b360-b21db70a9edf/work/cd/5345e89c3a067497644ddd9f188da0
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '/home/stage/epi2melabs/instances/wf-metagenomics_37c5bddf-85ba-412e-b360-b21db70a9edf/nextflow.log' file for details
Hi, I apologize for the late answer. This should have been fixed in the last version (2.3.0). Also you can use now the Silva database (although please take into account that the taxids are different from those of the NCBI and that it only reaches the genus rank).
Hi, Would you mind to confirm if this problem persists? If it has been solved, please feel free to close the issue.
What happened?
Hey there,
I’ve seen a similar error in your problems, but it doesn’t really match my personal documents. I just use your metagenomics workflow for my analyses and I have some troubles. I use my own database (Silva138.1) and my own SeqId2taxid, with this I have a problem with the minimap pipelines. I leave you my parameters and my analysis messages so that you can direct me.
Than you for you're help and if you have any questions so that I can tell you more if what I am doing so that you can help me do not hesitate. Sincerelly
Operating System
ubuntu 20.04
Workflow Execution
EPI2ME Labs desktop application
Workflow Execution - EPI2ME Labs Versions
No response
Workflow Execution - CLI Execution Profile
None
Workflow Version
wf-metagenomics v2.2.1
Relevant log output