epi2me-labs / wf-metagenomics

Metagenomic classification of long-read sequencing data
Other
55 stars 23 forks source link

[Bug]: Cannot get workflow to run #13

Closed jagos01 closed 1 year ago

jagos01 commented 1 year ago

What happened?

System - Ubuntu 18.04 with Nextflow 22.10.0 and wf-metagenomics v2.0.0 (b2bd2b89da)

Command - nextflow run epi2me-labs/wf-metagenomics --fastq /home/data1/Analyzed_data/metactrrun2/guppy_6.1.7/demux_trim/BC96_ZymoStd.fastq.gz --kraken2 --threads 20

Output - hangs on process > kraken_pipeline:kraken_server (left running for 24hrs) and kraken2_client never starts.

Checking inputs. executor > local (7) [d8/3d42fa] process > kraken_pipeline:unpackTaxonomy [100%] 1 of 1 ✔ [2f/ca0892] process > kraken_pipeline:unpackDatabase [100%] 1 of 1 ✔ [b8/759512] process > kraken_pipeline:kraken_server [ 0%] 0 of 1 [- ] process > kraken_pipeline:combineFilterFastq - [- ] process > kraken_pipeline:progressiveStats - [- ] process > kraken_pipeline:kraken2_client - [- ] process > kraken_pipeline:progressive_kreports - [- ] process > kraken_pipeline:taxon_kit - [- ] process > kraken_pipeline:bracken - [fd/a60b1a] process > kraken_pipeline:getVersions [100%] 1 of 1 ✔ [38/ed71ac] process > kraken_pipeline:getParams [100%] 1 of 1 ✔ [- ] process > kraken_pipeline:makeReport - [- ] process > kraken_pipeline:mergeclassifiedProgressive - [- ] process > kraken_pipeline:mergeunclassifiedProgressive - [- ] process > kraken_pipeline:catAssignmentsprogressive - [- ] process > kraken_pipeline:stop_kraken_server - [- ] process > kraken_pipeline:output - [- ] process > kraken_pipeline:output_dir - [2f/14c7e9] process > output (1) [100%] 2 of 2 ✔

I have also tried this workflow on a different system running Ubuntu 20.04. The workflow failed regardless if it was executed with Docker or Conda.

Operating System

ubuntu 18.04

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

No response

Workflow Execution - Execution Profile

Docker

Workflow Version

b2bd2b89da

Relevant log output

.command.log from b8/759512
Loading database information... done.
Server listening on localhost:8080. Press Ctrl-C to end.

Not sure what other logs would be helpful as other log files are empty.

I have also tried EPI2ME Labs (v3.15) with Labs environment v1.2.5. The workflow was terminated due to the following error:

Checking epi2me-labs/wf-metagenomics ...

epi2me-labs/wf-metagenomics contains uncommitted changes -- cannot pull from repository

N E X T F L O W ~ version 22.04.0

Project epi2me-labs/wf-metagenomics contains uncommitted changes -- Cannot switch to revision: v1.1.4

sarahjeeeze commented 1 year ago

Hi, It's not ideal but currently the workflow expects input directories to be structured like /*/ so perhaps try nextflow run epi2me-labs/wf-metagenomics --fastq /home/data1/Analyzed_data/metactrrun2/guppy_6.1.7 --kraken2 --threads 20. We will amend this shortly in the next release.

jagos01 commented 1 year ago

Thanks. This worked for the included test data with the default database but failed with the following error when I tried to use a local K2 database:

command: nextflow run epi2me-labs/wf-metagenomics --fastq '/home/scott/.nextflow/assets/epi2me-labs/wf-metagenomics/test_data' --kraken2 --database /home/scott/Data_DB/Kraken2_db/k2_pluspfp_20220607

[af/3f34f9] NOTE: Process kraken_pipeline:kraken2_client (1) terminated with an error exit status (74) -- Execution is retried (1) [a0/4bd77d] NOTE: Process kraken_pipeline:kraken2_client (2) terminated with an error exit status (74) -- Execution is retried (1) [63/e1b4c2] NOTE: Process kraken_pipeline:kraken2_client (3) terminated with an error exit status (74) -- Execution is retried (1) [b7/bead2c] NOTE: Process kraken_pipeline:kraken2_client (4) terminated with an error exit status (74) -- Execution is retried (1) Error executing process > 'kraken_pipeline:kraken2_client (1)'

Caused by: Process kraken_pipeline:kraken2_client (1) terminated with an error exit status (74)

Command executed:

kraken2_client --port 8080 --sequence "barcode01.2.fastq.gz" > "barcode01.kraken2.assignments.tsv" kraken2_client --port 8080 --report --sequence "barcode01.2.fastq.gz" > "out.txt" tail -n +2 "out.txt" > "tmp.txt" head -n -6 "tmp.txt" > "barcode01.kraken2_report.txt"

Command exit status: 74

Command output: (empty)

Command error: Connecting to server: localhost:8080. Extracting sequences from file: barcode01.2.fastq.gz Sequences extracted successfully. Uploading sequences... Sequences uploaded. Awaiting classification results... Sequence Stream RPC failed: failed to connect to all addresses

I encountered a different error when I ran a set of my sequences using the default k2 database.

command: nextflow run epi2me-labs/wf-metagenomics --fastq /home/scott/Desktop/test_seqs --kraken2

Error executing process > 'kraken_pipeline:bracken (1)'

Caused by: Process kraken_pipeline:bracken (1) terminated with an error exit status (1)

Command executed:

run_bracken.py "database_dir" "reports.1/unclassified.kreport.txt" "1000" "S" "unclassified.bracken_report.txt" mv "reports.1/unclassified.kreport_bracken_species.txt" . awk '{ print $3,$7}' "unclassified.bracken_report.txt" | awk 'NR!=1 {print}' > taxacounts.txt awk '{print $3}' "unclassified.bracken_report.txt" | awk 'NR!=1 {print}' > taxa.txt taxonkit --data-dir taxonomy_dir lineage -R taxa.txt > lineages.txt aggregate_lineages_bracken.py -i "lineages.txt" -b "taxacounts.txt" -p "unclassified.kraken2" file1=cat *.json echo "{"'"unclassified"'": "$file1"}" >> "unclassified.1.json" cp "unclassified.1.json" "reports.1/unclassified.json"

Command exit status: 1

Command output: b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i reports.1/unclasified.kreport.txt -o unclassified.bracken_report.txt -k database_dir/database1000mers.kmer_distrib -l S -t 10\nPROGRAM START TIME: 10-20-2022 16:20:49\n'

Command error: b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i reports.1/unclassified.kreport.txt -o unclassified.bracken_report.txt -k database_dir/database1000mers.kmer_distrib -l S -t 10\nPROGRAM START TIME: 10-20-2022 16:20:49\n'b'>> Checking report file: reports.1/unclassified.kreport.txt\nError: no reads found. Please check your Kraken report\n'mv: cannot stat 'reports.1/unclassified.kreport_bracken_species.txt': No such file or directory

Any help with these errors is appreciated. Thanks

sarahjeeeze commented 1 year ago

Hi, Thanks for finding these bugs. I am just trying to recreate your errors so I can fix the problem. The first one, how big was the k2_pluspfp database directory and does it contain all the kraken2 files required hash,opts,taxo + database1000mers file? Is it one from here? https://benlangmead.github.io/aws-indexes/k2. If so I can test it. For the second one could you try changing the parameter --bracken_level to 'G' or 'F' and see if you get any results?

jagos01 commented 1 year ago

Hello Sarah, Yes the database was from the site you mentioned. It was the full k2_pluspfp database so I believe 129 GB. I will try changing the bracken_level shortly. Thanks, Scott

Changing the bracken_level to Family or Genus failed to complete and produced the following errors Braken_level = 'G' Error executing process > 'kraken_pipeline:bracken (1)'

Caused by: Process kraken_pipeline:bracken (1) terminated with an error exit status (1)

Command executed:

run_bracken.py "database_dir" "reports.1/barcode15.kreport.txt" "1000" "G" "barcode15.bracken_report.txt" mv "reports.1/barcode15.kreport_bracken_species.txt" . awk '{ print $3,$7}' "barcode15.bracken_report.txt" | awk 'NR!=1 {print}' > taxacounts.txt awk '{print $3}' "barcode15.bracken_report.txt" | awk 'NR!=1 {print}' > taxa.txt taxonkit --data-dir taxonomy_dir lineage -R taxa.txt > lineages.txt aggregate_lineages_bracken.py -i "lineages.txt" -b "taxacounts.txt" -p "barcode15.kraken2" file1=cat *.json echo "{"'"barcode15"'": "$file1"}" >> "barcode15.1.json" cp "barcode15.1.json" "reports.1/barcode15.json"

Command exit status: 1

Command output: b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i reports.1/barcode15.kreport.txt -o barcode15.bracken_report.txt -k database_dir/database1000mers.kmer_distrib -l G -t 10\nPROGRAM START TIME: 10-21-2022 20:24:51\n'

Command error: b'>> Checking report file: reports.1/barcode15.kreport.txt\nError: no reads found. Please check your Kraken report\n'b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i reports.1/barcode15.kreport.txt -o barcode15.bracken_report.txt -k database_dir/database1000mers.kmer_distrib -l G -t 10\nPROGRAM START TIME: 10-21-2022 20:24:51\n'mv: cannot stat 'reports.1/barcode15.kreport_bracken_species.txt': No such file or directory

Braken_level = 'F' Error executing process > 'kraken_pipeline:bracken (1)'

Caused by: Process kraken_pipeline:bracken (1) terminated with an error exit status (1)

Command executed:

run_bracken.py "database_dir" "reports.1/barcode15.kreport.txt" "1000" "F" "barcode15.bracken_report.txt" mv "reports.1/barcode15.kreport_bracken_species.txt" . awk '{ print $3,$7}' "barcode15.bracken_report.txt" | awk 'NR!=1 {print}' > taxacounts.txt awk '{print $3}' "barcode15.bracken_report.txt" | awk 'NR!=1 {print}' > taxa.txt taxonkit --data-dir taxonomy_dir lineage -R taxa.txt > lineages.txt aggregate_lineages_bracken.py -i "lineages.txt" -b "taxacounts.txt" -p "barcode15.kraken2" file1=cat *.json echo "{"'"barcode15"'": "$file1"}" >> "barcode15.1.json" cp "barcode15.1.json" "reports.1/barcode15.json"

Command exit status: 1

Command output: b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i reports.1/barcode15.kreport.txt -o barcode15.bracken_report.txt -k database_dir/database1000mers.kmer_distrib -l F -t 10\nPROGRAM START TIME: 10-21-2022 20:23:16\n'

Command error: b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i reports.1/barcode15.kreport.txt -o barcode15.bracken_report.txt -k database_dir/database1000mers.kmer_distrib -l F -t 10\nPROGRAM START TIME: 10-21-2022 20:23:16\n'b'>> Checking report file: reports.1/barcode15.kreport.txt\nError: no reads found. Please check your Kraken report\n'mv: cannot stat 'reports.1/barcode15.kreport_bracken_species.txt': No such file or directory

Thanks

On Fri., Oct. 21, 2022, 10:10 a.m. Sarah Griffiths, < @.***> wrote:

Hi, I am just trying to recreate your errors so I can fix the problem. The first one, how big was the k2_pluspfp database directory and does it contain all the kraken2 files required hash,opts,taxo + database1000mers file? Is it one from here? https://benlangmead.github.io/aws-indexes/k2. If so I can test it. For the second one could you try changing the parameter --bracken_level to 'G' or 'F' and see if you get any results?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-metagenomics/issues/13#issuecomment-1287169126, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALWFTRHMRC5JXAHCBS5GI4DWEK6ABANCNFSM6AAAAAARJCWOJI . You are receiving this because you authored the thread.Message ID: @.***>

nggvs commented 1 year ago

Hi, Thank you for using the workflow. Could you confirm if this issue has been solved? We'll close this ticket on the assumption things are now resolved.

jagos01 commented 1 year ago

Hello, The workflow was unable to finish with this data set. I instead used Kraken2 to analyze this data. Thanks

nggvs commented 1 year ago

Hi, Could you send the error that you're observing (the parameters and the versions of the workflow and EPI2MElabs (if you are using it)? From your previous report I'll see that you are using the k2_pluspfp database, which is the RAM memory you have available to run the workflow, given that it must be slightly higher than size of the database.

jagos01 commented 1 year ago

The workflow was given 192GB RAM. I ran several datasets today with the latest version (2.2.1) and all completed without issues. Thanks,