epi2me-labs / wf-metagenomics

Metagenomic classification of long-read sequencing data
Other
62 stars 23 forks source link

invalid Kingdom taxonomic_rank option #98

Closed RunpengLuo closed 2 months ago

RunpengLuo commented 6 months ago

Operating System

Other Linux (please specify below)

Other Linux

No response

Workflow Version

v2.9.4

Workflow Execution

Command line (Local)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-metagenomics --fastq ${path2fastq} \ --classifier kraken2 --analyse_unclassified --database_set ${db} \ --taxonomic_rank k --include_kraken2_assignments \ --outdir out${db}/ --threads 16 1>wf${db}.out 2>wf${db}.err

Workflow Execution - CLI Execution Profile

None

What happened?

wf-metagenomics requires taxonomic_rank to be set to k for kingdom, however, in the later step during executing est_abundance.py, it requires upper case k K instead, which leads to an inconsistency error and abort the whole pipeline. This should be a typo within the workflow argument parsing step and should change lower case k to upper case K instead to comply to subsequent commands.

Relevant log output

Caused by:
  Process `kraken_pipeline:run_bracken (3502253)` terminated with an error exit status (1)

Command executed:

  # run bracken on the latest kreports, is this writing some outputs
  # alongside the inputs? seems at least {}.kreport_bracken_species.txt
  # is written alongside the input
  BRACKEN_LENGTH=$(cat bracken_length.txt)

  workflow-glue run_bracken         "ncbi_16s_18s_28s_ITS_kraken2_db"         kraken2.report         $BRACKEN_LENGTH         "k"         "3502253.kraken2_bracken.report"

  # do some stuff...
  awk -F '  ' -v OFS='  ' '{ print $2,$6 }' "3502253.kraken2_bracken.report"     | awk -F ' ' -v OFS='  ' 'NR!=1 {print}'     | tee taxacounts.txt     | awk -F '   ' -v OFS='  ' '{ print $1 }' > taxa.txt
  taxonkit lineage         -j 14         --data-dir taxdmp_2023-01-01_db         -R taxa.txt  > lineages.txt
  workflow-glue aggregate_lineages_bracken         -i "lineages.txt" -b "taxacounts.txt"         -u kraken2.report         -p "3502253.kraken2"         -r "k"

  # add sample to the json file    
  file1=$(find -name '*.json' -exec cat {} +)
  echo "{"'"3502253"'": $file1}" >> "bracken.json"
  mv "bracken.json" "3502253.json"

Command exit status:
  1

Command output:
  b' >> Checking for Valid Options...\n >> Running Bracken \n      >> python src/est_abundance.py -i kraken2.report -o 3502253.kraken2_bracken.report -k ncbi_16s_18s_28s_ITS_kraken2_db/database1000mers.kmer_distrib -l k -t 0\nPROGRAM START TIME: 05-20-2024 06:12:15\n'

Command error:
  [06:12:14 - matplotlib.font_manager] generated new fontManager
  [06:12:15 - workflow_glue] Starting entrypoint.
  b'Traceback (most recent call last):\n  File "/home/epi2melabs/conda/bin/src/est_abundance.py", line 554, in <module>\n    main()\n  File "/home/epi2melabs/conda/bin/src/est_abundance.py", line 245, in main\n    branch_lvl = main_lvls.index(args.level[0])\nValueError: \'k\' is not in list\n'b' >> Checking for Valid Options...\n >> Running Bracken \n      >> python src/est_abundance.py -i kraken2.report -o 3502253.kraken2_bracken.report -k ncbi_16s_18s_28s_ITS_kraken2_db/database1000mers.kmer_distrib -l k -t 0\nPROGRAM START TIME: 05-20-2024 06:12:15\n'

Work dir:
  /mnt/nano/amplicon_directory/Output_epi2me/DMc23-8545/wd_ncbi_16s_18s_28s_ITS_kingdom/work/05/7d09db8867bd84e1442308f8f7d93f

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-20 16:12:17.812 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `kraken_pipeline:run_bracken (3502253)` terminated with an error exit status (1)
May-20 16:12:17.830 [Task monitor] DEBUG nextflow.Session - The following nodes are still active:
[process] kraken_pipeline:createAbundanceTables
  status=ACTIVE
  port 0: (value) OPEN  ; channel: lineages/*
  port 1: (value) bound ; channel: taxonomic_rank
  port 2: (value) bound ; channel: pipeline
  port 3: (cntrl) -     ; channel: $

[process] kraken_pipeline:makeReport
  status=ACTIVE
  port 0: (value) bound ; channel: read_stats/per-read-stats*.tsv.gz
  port 1: (value) OPEN  ; channel: abundance_table
  port 2: (value) OPEN  ; channel: lineages/*
  port 3: (value) bound ; channel: versions/*
  port 4: (value) bound ; channel: params.json
  port 5: (value) bound ; channel: taxonomic_rank
  port 6: (queue) OPEN  ; channel: amr
  port 7: (cntrl) -     ; channel: $

[process] kraken_pipeline:output_results
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

May-20 16:12:18.777 [main] DEBUG nextflow.Session - Session await > all processes finished
May-20 16:12:18.778 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-20 16:12:18.779 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-20 16:12:19.103 [main] WARN  n.processor.TaskPollingMonitor - Killing running tasks (2)
May-20 16:12:19.152 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=445; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=435; submittedCount=1; runningCount=0; retriesCount=0; abortedCount=1; succeedDuration=50m 38s; failedDuration=6m 21s; cachedDuration=0ms;loadCpus=-14; loadMemory=0; peakRunning=12; peakCpus=28; peakMemory=24 GB; ]
May-20 16:12:19.152 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
May-20 16:12:19.161 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
May-20 16:12:21.052 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
May-20 16:12:21.235 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-20 16:12:21.280 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

nggvs commented 5 months ago

Hi @RunpengLuo Thank you for using the workflow! I'll review what you have mentioned and write you back Thank you very much!

nggvs commented 5 months ago

Hi @RunpengLuo ,

I've been looking into this. I have found a fix for the minimap2 pipeline. However, for the kraken approach is a bit more tricky as Bacteria and Archaea doesn't have Kingdom level, and that may interfere with how bracken estimate abundances. I'm doing further investigations to think the best way to approach this, but meanwhile is there any specific reason why you want to run the analysis at the Kingdom level?

Thank you very much in advance!

nggvs commented 2 months ago

Hi, I have raised this issue internally, so I close this meanwhile. Thank you very much for using the workflow and for the feedback!