genomicsITER / NanoRTax

Real-time analysis pipeline for nanopore 16S rRNA data
MIT License
12 stars 7 forks source link

kraken2: database ("/tmp/db/krakendb/16S_RDP_k2db/") does not contain necessary file taxo.k2d #11

Open dorinrojas opened 8 months ago

dorinrojas commented 8 months ago

Hi there!

I am trying to run NanoRTax using conda (my computational allocation is incompatible with docker and singularity). I used the '--profile conda' flag and it didn't work. It gave me this error:

(nanortax) [dorojas@dribe-06 2-nanortax]$ nextflow run main.nf --reads 'data/fetuccini1.fastq' -profile conda --outdir prueba/
N E X T F L O W  ~  version 22.10.6
Launching `main.nf` [elated_watson] DSL1 - revision: 5faf521cd0
WARN: Access to undefined parameter `multiqc_config` -- Initialise it to a default value eg. `params.multiqc_config = some_value`
WARN: Access to undefined parameter `reads_rt` -- Initialise it to a default value eg. `params.reads_rt = some_value`
WARN: Access to undefined parameter `kraken` -- Initialise it to a default value eg. `params.kraken = some_value`
WARN: Access to undefined parameter `centrifuge` -- Initialise it to a default value eg. `params.centrifuge = some_value`
WARN: Access to undefined parameter `blast` -- Initialise it to a default value eg. `params.blast = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rtnanopipeline v1.0dev
----------------------------------------------------

Run Name          : elated_watson
Reads             : data/fetuccini1.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Output dir        : prueba/
Launch dir        : /work/dorojas/6-semen/2-nanortax
Working dir       : /work/dorojas/6-semen/2-nanortax/work
Script dir        : /work/dorojas/6-semen/2-nanortax
User              : dorojas
Config Profile    : conda
----------------------------------------------------
WARN: Access to undefined parameter `hostnames` -- Initialise it to a default value eg. `params.hostnames = some_value`
executor >  local (2)
[0c/5c9c55] process > QC (1)               [  0%] 0 of 1
[-        ] process > qc_reporting         -
[-        ] process > read_binning_kraken  -
[-        ] process > agg_kraken           -
[-        ] process > kraken_push          -
[-        ] process > agg_kraken_diversity -
[32/fa2163] process > output_documentation [  0%] 0 of 1
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (127)
executor >  local (2)
[0c/5c9c55] process > QC (1)               [100%] 1 of 1, failed: 1 ✘
[-        ] process > qc_reporting         -
[-        ] process > read_binning_kraken  -
[-        ] process > agg_kraken           -
[-        ] process > kraken_push          -
[-        ] process > agg_kraken_diversity -
[32/fa2163] process > output_documentation [  0%] 0 of 1
Execution cancelled -- Finishing pending tasks before exit
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (127)
executor >  local (2)
[0c/5c9c55] process > QC (1)               [100%] 1 of 1, failed: 1 ✘
[-        ] process > qc_reporting         -
[-        ] process > read_binning_kraken  -
[-        ] process > agg_kraken           -
[-        ] process > kraken_push          -
[-        ] process > agg_kraken_diversity -
[32/fa2163] process > output_documentation [100%] 1 of 1, failed: 1 ✘
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (127)

Command executed:

  barcode=$(basename $(dirname /work/dorojas/6-semen/2-nanortax/data/fetuccini1.fastq))
  fastp -i /work/dorojas/6-semen/2-nanortax/data/fetuccini1.fastq -q 8 -l 1400 --length_limit 1700 -o $barcode\_qced_reads.fastq --json $barcode\_qc_report.txt
  head -n30 $barcode\_qc_report.txt | sed '30s/,/\n}/' > $barcode\_qc_report.json
  echo "}" >> $barcode\_qc_report.json

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 3: fastp: command not found

Work dir:
  /work/dorojas/6-semen/2-nanortax/work/0c/5c9c55cd01df5977db1e05b85fa605

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

It seemed to me that the conda environment didn't include 'fastp' but it is specified in the .yml file.

I couldn't find a solution for this, so I decided to create my own conda environment with the .yml file using the command

conda create -n nanortax --file=environment.yml

I excluded the '-profile' flag for the code to call each of the tools out of the working directory (which worked correctly). I know this is not recommended, it was my last resource to try and run this workflow.

The solution worked, but it is outputting this new error about the kraken db:

(nanortax) [dorojas@dribe-06 2-nanortax]$ nextflow run main.nf --reads 'data/fetuccini1.fastq' --outdir prue
ba/
N E X T F L O W  ~  version 22.10.6
Launching `main.nf` [hungry_bartik] DSL1 - revision: 5faf521cd0
WARN: Access to undefined parameter `multiqc_config` -- Initialise it to a default value eg. `params.multiqc_config = some_value`
WARN: Access to undefined parameter `reads_rt` -- Initialise it to a default value eg. `params.reads_rt = some_value`
WARN: Access to undefined parameter `kraken` -- Initialise it to a default value eg. `params.kraken = some_value`
WARN: Access to undefined parameter `centrifuge` -- Initialise it to a default value eg. `params.centrifuge = some_value`
WARN: Access to undefined parameter `blast` -- Initialise it to a default value eg. `params.blast = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rtnanopipeline v1.0dev
----------------------------------------------------

Run Name          : hungry_bartik
Reads             : data/fetuccini1.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Output dir        : prueba/
Launch dir        : /work/dorojas/6-semen/2-nanortax
Working dir       : /work/dorojas/6-semen/2-nanortax/work
Script dir        : /work/dorojas/6-semen/2-nanortax
User              : dorojas
Config Profile    : standard
----------------------------------------------------
WARN: Access to undefined parameter `hostnames` -- Initialise it to a default value eg. `params.hostnames = some_value`
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [  0%] 0 of 1
[8b/337657] process > read_binning_kraken (1) [  0%] 0 of 1
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [  0%] 0 of 1
[8b/337657] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [100%] 1 of 1 ✔
[8b/337657] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [100%] 1 of 1 ✔
[8b/337657] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
  awk -F "\t" '{print $2}' krakenreport.txt >> seq_ids.txt
  gawk -F "\t" 'match($0, /\(taxid ([0-9]+)\)/, ary) {print ary[1]}' krakenreport.txt | taxonkit lineage --data-dir /tmp/db/ > lineage.txt
  cat lineage.txt | taxonkit reformat  --data-dir /tmp/db/ | csvtk -H -t cut -f 1,3 | csvtk -H -t sep -f 2 -s ';' -R > seq_tax.txt
  cat lineage.txt | taxonkit reformat -P  --data-dir /tmp/db/ | csvtk -H -t cut -f 1,3 > seq_tax_otu.txt
  paste seq_ids.txt seq_tax.txt > kraken_report_annotated.txt
  paste seq_ids.txt seq_tax_otu.txt > kraken_report_annotated_otu.txt

Command exit status:
  2

Command output:
  (empty)

Command error:
  kraken2: database ("/tmp/db/krakendb/16S_RDP_k2db/") does not contain necessary file taxo.k2d

Work dir:
  /work/dorojas/6-semen/2-nanortax/work/8b/337657f6151ad5f128d8b542457769

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

My database directories are not the same, but I changed the path in the nextflow.config file. The modification and the tree to the directory are below:

# config file mofications: 
  taxonkit_db = "db/"
  blast_db = "db/blastdb/"
  blast_taxdb = "db/blastdb/"
  kraken_db = "db/krakendb/16S_RDP_k2db/"
  centrifuge_db = "db/centrifugedb/"

# my db directory
(nanortax) [dorojas@login-1 2-nanortax]$ tree db/
db/
├── blastdb
│   ├── 16S_ribosomal_RNA.ndb
│   ├── 16S_ribosomal_RNA.nhr
│   ├── 16S_ribosomal_RNA.nin
│   ├── 16S_ribosomal_RNA.nnd
│   ├── 16S_ribosomal_RNA.nni
│   ├── 16S_ribosomal_RNA.nog
│   ├── 16S_ribosomal_RNA.nos
│   ├── 16S_ribosomal_RNA.not
│   ├── 16S_ribosomal_RNA.nsq
│   ├── 16S_ribosomal_RNA.ntf
│   ├── 16S_ribosomal_RNA.nto
│   ├── 16S_ribosomal_RNA.tar.gz
│   ├── taxdb.btd
│   ├── taxdb.bti
│   ├── taxdb.tar.gz
│   └── taxonomy4blast.sqlite3
├── centrifugedb
│   ├── p_compressed.1.cf
│   ├── p_compressed_2018_4_15.tar.gz
│   ├── p_compressed.2.cf
│   ├── p_compressed.3.cf
│   └── p_compressed.4.cf
├── citations.dmp
├── delnodes.dmp
├── division.dmp
├── gc.prt
├── gencode.dmp
├── images.dmp
├── krakendb
│   ├── 16S_RDP11.5_20200326.tgz
│   └── 16S_RDP_k2db
│       ├── 16S_RDP11.5_20200326.tgz
│       ├── database100mers.kmer_distrib
│       ├── database150mers.kmer_distrib
│       ├── database200mers.kmer_distrib
│       ├── database250mers.kmer_distrib
│       ├── database50mers.kmer_distrib
│       ├── database75mers.kmer_distrib
│       ├── hash.k2d
│       ├── opts.k2d
│       ├── README.md
│       ├── seqid2taxid.map
│       └── taxo.k2d
├── merged.dmp
├── names.dmp
├── nodes.dmp
├── readme.txt
└── taxdump.tar.gz

4 directories, 45 files

The error says that the 'taxo.k2d' is not present in the temporary directory, but it's in my path.

Does anybody have an idea of how to solve this issue? Or the first issue about the '-profile conda' command?

I am new to bioinformatics, so help is very much appreciated!