genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
96 stars 44 forks source link

Error executing process > consensus_classification #43

Open fritzthm opened 3 years ago

fritzthm commented 3 years ago

Hello, I'm trying to run NanoCLUST with my 16S sequence data. I run it on a Linux CentOS 7 machine. Also with the test data I get an error reaching the 'consensus_classification' module:

When I use the command: nextflow run main.nf -profile test,docker

I get the following terminal output:

N E X T F L O W ~ version 21.04.1 Launching main.nf [nostalgic_mestorf] - revision: 5e0f88a799

Run Name : nostalgic_mestorf Reads : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq Max Resources : 128 GB memory, 16 cpus, 10d time per job Container : docker - [:] Output dir : ./results Launch dir : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST Working dir : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/work Script dir : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST User : bcl2fastq Config Profile : test,docker Config Description: Minimal test dataset to check pipeline function

executor > local (80) [59/8242a2] process > QC (1) [100%] 1 of 1 ✔ [ae/0c905f] process > fastqc (1) [100%] 1 of 1 ✔ [65/245dd8] process > kmer_freqs (1) [100%] 1 of 1 ✔ [75/ee543b] process > read_clustering (1) [100%] 1 of 1 ✔ [11/ba1911] process > split_by_cluster (1) [100%] 1 of 1 ✔ [6e/f6a7b9] process > read_correction (1) [100%] 8 of 8 ✔ [e5/038ac3] process > draft_selection (8) [100%] 8 of 8 ✔ [f5/4c1169] process > racon_pass (8) [100%] 8 of 8 ✔ [da/75dc38] process > medaka_pass (8) [100%] 8 of 8 ✔ [5b/acba47] process > consensus_classification (4) [ 95%] 36 of 38, failed: 36, retries: 35 [- ] process > join_results - [- ] process > get_abundances - [- ] process > plot_abundances - [62/bce7f9] process > output_documentation [100%] 1 of 1 ✔ [d6/d15ac5] NOTE: Process consensus_classification (5) terminated with an error exit status (2) -- Execution is retried (4) [a2/3f7b37] NOTE: Process consensus_classification (3) terminated with an error exit status (2) -- Execution is retried (5) [13/0144d3] NOTE: Process consensus_classification (8) terminated with an error exit status (2) -- Execution is retried (2) [6a/d909d2] NOTE: Process consensus_classification (2) terminated with an error exit status (2) -- Execution is retried (5) [df/88a4ed] NOTE: Process consensus_classification (1) terminated with an error exit status (2) -- Execution is retried (5) [7d/9aba43] NOTE: Process consensus_classification (7) terminated with an error exit status (2) -- Execution is retried (3) [97/865ac9] NOTE: Process consensus_classification (6) terminated with an error exit status (2) -- Execution is retried (5) [a7/819c20] NOTE: Process consensus_classification (4) terminated with an error exit status (2) -- Execution is retried (5) [55/9ada5e] NOTE: Process consensus_classification (5) terminated with an error exit status (2) -- Execution is retried (5) Error executing process > 'consensus_classification (3)'

Caused by: Process consensus_classification (3) terminated with an error exit status (2)

Command executed:

export BLASTDB= export BLASTDB=$BLASTDB:/tmp/db/taxdb/ blastn -query consensus.fasta -db /tmp/db/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv

DECIDE FINAL CLASSIFFICATION

cat 2_draft.log > 2_blast.log echo -n ";" >> 2_blast.log BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1) echo $BLAST_OUT >> 2_blast.log

Command exit status: 2

Command output: (empty)

Command error: BLAST Database error: No alias or index file found for nucleotide database [/tmp/db/16S_ribosomal_RNA] in search path [/home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/work/4d/1076da07aa671eabdee31a99520c1b::/tmp/db/taxdb:]

Work dir: /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/work/4d/1076da07aa671eabdee31a99520c1b

Many thanks, Fritz

JulianeLiberto commented 2 years ago

Hi Fritz, I am also getting this exact command error. Oddly, when I run the test data it works, but when I run my fastq file it fails. Were you able to find a solution?

fritzthm commented 2 years ago

Dear Juliane, NanoClust is running now on my system. I made some changes in the main.nf file which I don't remember yet. In the meantime my system crashed and I reinstalled everything new (I'm now using Scientific Linux which is basically CentOS 7). Now everything works fine without any changes. Good luck, Fritz

Von: JulianeLiberto @.> An: genomicsITER/NanoCLUST @.> Kopie: fritzthm @.>, Author @.> Gesendet: 27.08.2021 21:31 Betreff: Re: [genomicsITER/NanoCLUST] Error executing process > consensus_classification (#43)

Hi Fritz, I am also getting this exact command error. Oddly, when I run the test data it works, but when I run my fastq file it fails. Were you able to find a solution? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

JulianeLiberto commented 2 years ago

Thanks for getting back to me Fritz! I eventually found a workaround by moving the db folders to a new path. I don't know why that worked, since I triple-checked the original path, but it did.

Kindly, Juliane

On Wed, Sep 1, 2021 at 6:02 AM fritzthm @.***> wrote:

Dear Juliane, NanoClust is running now on my system. I made some changes in the main.nf file which I don't remember yet. In the meantime my system crashed and I reinstalled everything new (I'm now using Scientific Linux which is basically CentOS 7). Now everything works fine without any changes. Good luck, Fritz

Von: JulianeLiberto @.> An: genomicsITER/NanoCLUST @.> Kopie: fritzthm @.>, Author @.> Gesendet: 27.08.2021 21:31 Betreff: Re: [genomicsITER/NanoCLUST] Error executing process > consensus_classification (#43)

Hi Fritz, I am also getting this exact command error. Oddly, when I run the test data it works, but when I run my fastq file it fails. Were you able to find a solution? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genomicsITER/NanoCLUST/issues/43#issuecomment-910130681, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVMCGU6LL3BVL7YL54GKFFDT7X22XANCNFSM47PVP2LQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Juliane Liberto

tjalfdeboer commented 2 years ago

I had the same problem and found that the main.nf file puts "/tmp/" in front of your database file paths so if the databases are in any other folder than /tmp then the blast executable will not find them. To remediate it you can change the main.nf file in lines 434 and 435 so those lines read "db= params.db" and "taxdb= params.tax" (remove the blast_dir argument) and then you can use any other file path for the database locations

JulianeLiberto commented 2 years ago

Thank you tjalfdeboer. That is super helpful!

fernanarr commented 2 years ago

Hi all,

I'm having the same error message but with exit code 255

Executing this command:

$ nextflow run main.nf --reads 'data/fichero_concatenado_02032022.fastq' --db 'db/16S_ribosomal_RNA' --tax 'db/taxdb' -profile docker

Run Name          : cranky_northcutt
Reads             : data/fichero_concatenado_02032022.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - [:]
Output dir        : ./results
Launch dir        : /home/fernando/NanoCLUST
Working dir       : /home/fernando/NanoCLUST/work
Script dir        : /home/fernando/NanoCLUST
User              : fernando
Config Profile    : docker
----------------------------------------------------
executor >  local (31)
[10/898d50] process > QC (1)                       [100%] 1 of 1 ✔
[3d/80210b] process > fastqc (1)                   [100%] 1 of 1 ✔
executor >  local (31)
[10/898d50] process > QC (1)                       [100%] 1 of 1 ✔
[3d/80210b] process > fastqc (1)                   [100%] 1 of 1 ✔
[d3/306646] process > kmer_freqs (1)               [100%] 1 of 1 ✔
[28/0eecf7] process > read_clustering (1)          [100%] 1 of 1 ✔
[d9/7c54bd] process > split_by_cluster (1)         [100%] 1 of 1 ✔
[d2/cd1dd7] process > read_correction (2)          [100%] 3 of 3 ✔
[ae/0f7261] process > draft_selection (3)          [100%] 3 of 3 ✔
[09/7e330c] process > racon_pass (3)               [100%] 3 of 3 ✔
[1e/95b1ae] process > medaka_pass (3)              [100%] 3 of 3 ✔
[1f/817d22] process > consensus_classification (2) [100%] 12 of 12, failed: 11, retries: 10
[-        ] process > join_results                 -
[-        ] process > get_abundances               -
[-        ] process > plot_abundances              -
[7e/753279] process > output_documentation         [100%] 1 of 1 ✔
[56/b64597] NOTE: Process `consensus_classification (1)` terminated with an error exit status (255) -- Execution is retried (1)
[1a/695ac0] NOTE: Process `consensus_classification (2)` terminated with an error exit status (255) -- Execution is retried (1)
[57/5b6f32] NOTE: Process `consensus_classification (1)` terminated with an error exit status (255) -- Execution is retried (2)
[ff/0c2cd6] NOTE: Process `consensus_classification (2)` terminated with an error exit status (255) -- Execution is retried (2)
[9a/0706e0] NOTE: Process `consensus_classification (1)` terminated with an error exit status (255) -- Execution is retried (3)
[e3/87a0e8] NOTE: Process `consensus_classification (2)` terminated with an error exit status (255) -- Execution is retried (3)
[ec/38fca3] NOTE: Process `consensus_classification (1)` terminated with an error exit status (255) -- Execution is retried (4)
[8f/5bfd7e] NOTE: Process `consensus_classification (2)` terminated with an error exit status (255) -- Execution is retried (4)
[73/9fa4f3] NOTE: Process `consensus_classification (1)` terminated with an error exit status (255) -- Execution is retried (5)
[b5/06d84a] NOTE: Process `consensus_classification (2)` terminated with an error exit status (255) -- Execution is retried (5)
Error executing process > 'consensus_classification (1)'

Caused by:
  Process `consensus_classification (1)` terminated with an error exit status (255)

Command executed:

  export BLASTDB=
  export BLASTDB=$BLASTDB:/tmp/db/taxdb/
  blastn -query consensus.fasta -db /tmp/db/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv
  #DECIDE FINAL CLASSIFFICATION
  cat 2_draft.log > 2_blast.log
  echo -n ";" >> 2_blast.log
  BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1)
  echo $BLAST_OUT >> 2_blast.log

Command exit status:
  255

Command output:
  (empty)

Command error:
  Error: NCBI C++ Exception:
      T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 98: overflow error ( at [].[].gi)
      T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

The pipeline works perfectly with some of the sequences in the fastq but, when I add the final sequences from my samples to my .fastq, I get this error.

I've tried to modify main.nf file as @tjalfdeboer suggested, but I can't either get my final result.

Does anyone have had this problem and can help me to solve it.

Thanks a lot in advance

niederro commented 2 years ago

Hi fernanarr, I'm running into the same problem with my datasets. Did you find a workaround already? Many thanks, Robert

fernanarr commented 2 years ago

Hi @niederro, not yet. We are still looking for it.

DanBeaton commented 2 years ago

Hello -- I am also stuck at the consensus_classification step. If I understand the output, the percentage shown doesn't mean much. It looks to me that 100% of the blastn processes fail.

The error message is: NOTE: Process consensus_classification (###) terminated with an error exit status (2) -- Execution is retried (1). Is this error from blastn, from docker or from Nextflow?

Scanning some of the consensus_classification.csv files shows them to be empty.

The dataset I'm using to test was downloaded from SRA. Based on the associated publication, the data was analyzed using QIIME2 then used vsearch compared to the SILVA database.

DanBeaton commented 2 years ago

Hi Fernanarr -- I've been getting the same Blast error. After manually running the blast command using the consensus.fasta file created for a few of the 16 fails, I can create a "#_blast.log" file from the "#_draft.log" file. Unfortunately, resuming the workflow still results in 16 fails and the run stops without moving to the next process.

Testing a few other things changes the total number of consensus.fasta files (ranging from 505 to 518) but the workflow stops at 16 fails plus 5 retries each.

Would it possible to add a line to store the fails in a file and move on in the process?

niederro commented 2 years ago

Hi DanBeaton, I think you are completely right. For most of my sequences, the blast delivered for the majority of files results, however, for some it didn't, which potentially caused the failure. Did you manage to find a workaround already? Best, Robert

DanBeaton commented 2 years ago

Hi @niederro -- I was able to get past the error. I have the blast executable already installed on my computer (version 2.11.0+-x64-macos) and created the 16S database directly from the executable.

Instead of pointing the nextflow run to the database created as part of the NanoCLUST setup, I point to this database. With this change, the run got past the error and move to the next parts of the process. If it had not gotten past the error, I was going to try updating the executable to the current version and re-create the database.

I find myself wondering if there is a mismatch between the database in the ftp instructions and the blast version in the environment, or that the ftp'd database is somehow corrupt. Both seem to be unlikely given that some pass and some fail.

kazubado33 commented 2 years ago

Hi @DanBeaton, your solution is very nice. However, I do not understand your "created the 16S database directly from the executable file". How did you do this? Download all the Fasta files for the 16S rRNA, and then Did you create the 16S rRNA database with the Blast executable makeblastdb?

DanBeaton commented 2 years ago

Hi @kazubado33, I followed the instructions from the link ... https://www.ncbi.nlm.nih.gov/books/NBK52640/ ... to install and configure blast.

Also included in the instruction is how to download databases into a directory. The 16S_ribosomal_RNA database is provided as the example ---> perl ../bin/update_blastdb.pl --passive --decompress 16S_ribosomal_RNA.

This link ... https://www.ncbi.nlm.nih.gov/books/NBK569850/ ... provides instructions on viewing a list of the other database.

niederro commented 2 years ago

Hi @DanBeaton, I exactly did the same but Nanoclust stopped with the error of a non indexed database. Using this command : perl ../bin/update_blastdb.pl --passive --decompress 16S_ribosomal_RNA produced the exact same files as part of the Nanoclust wf and these seem to produce the other error. Any idea what is wrong or what did you do differently? Thanks in advance for any help.

niederro commented 2 years ago

@DanBeaton did you also change something in the main.nf file?

DanBeaton commented 2 years ago

Hi Robert @niederro. To answer your question -- No I did not make changes to the main.nf file.

When running via docker on my Mac, with a blast database created via blast V2.11.0, the updated database works great.

When I installed and ran the same data files on a linux compute cluster, which meant using conda instead of docker or other options, and which has the most current blast, V2.13.0, plus a database created from this version, the consensus_classification error returned, but only for one of the files, so I could isolate it and try a few things.

With the help of one of the compute cluster managers -- who did make changes to the the main.nf file by adding echo statements to the consensus_classification process (output provided below) -- the error in the consensus_classification process was isolated to the blastn command ---- as the error message points to!

I then tried running without stating a databases, so that the --remote option was used. All but 7 of the consenus.fasta files failed.

On viewing the consensus_classification conda environment (in the conda_env folder), the listed blast version is 2.10.1. The most current version of blast in bioconda is 2.12.0. So I changed the blast version in the environment.yaml file from blast 2.10.1 to blast 2.12.0 and re-ran the file. The process went to completion with no errors.

So, I think the outcome is: on my Mac with blast 2.11.0, the docker's blast version used in the process 'corresponds' to the version used to create the blastdb. While on the linux compute cluster, the blast version used in the process now also 'corresponds' to the version used to create the blastdb.

I guess the take home message is to match the blast version used to create the blast db with the blast version used in the process.

I hope this helps :-)

################################################################# Error executing process > 'consensus_classification (21)'

Caused by: Process consensus_classification (21) terminated with an error exit status (255)

Command executed:

export BLASTDB= echo "2" export BLASTDB=$BLASTDB:/athena/home/beatond/ncbi/blastdb/ echo "BLASTDB:" $BLASTDB echo "taxdb: " /athena/home/beatond/ncbi/blastdb/ which blastn echo "2a" blastn -query consensus.fasta -db /athena/home/beatond/ncbi/blastdb/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv

DECIDE FINAL CLASSIFFICATION

echo "3" pwd cat 101_draft.log > 101_blast.log echo "4" echo -n ";" >> 101_blast.log echo "5" BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1) echo "6" echo $BLAST_OUT >> 101_blast.log echo "7"

Command exit status: 255

Command output: 2 BLASTDB: :/athena/home/beatond/ncbi/blastdb/ taxdb: /athena/home/beatond/ncbi/blastdb/ /athena/home/beatond/Tools/work/conda/consensus_classification-8c200bed21bbad7a3d574f93a7a33902/bin/blastn 2a

Command error: ps: /athena/opt/bioconda/2021.05/lib/libuuid.so.1: no version information available (required by /lib64/libblkid.so.1) ps: /athena/opt/bioconda/2021.05/lib/libuuid.so.1: no version information available (required by /lib64/libblkid.so.1) ps: /athena/opt/bioconda/2021.05/lib/libuuid.so.1: no version information available (required by /lib64/libblkid.so.1) ps: /athena/opt/bioconda/2021.05/lib/libuuid.so.1: no version information available (required by /lib64/libblkid.so.1) Error: NCBI C++ Exception: T0 "/opt/conda/conda-bld/blast_1607337341665/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 92: overflow error ( at [].[].gi) T0 "/opt/conda/conda-bld/blast_1607337341665/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

poursalavati commented 1 year ago

New solution: https://github.com/genomicsITER/NanoCLUST/issues/70#issuecomment-1200501397