Open nick-youngblut opened 2 years ago
This issue is specific to krakenuniq=0.7
. If I use krakenuniq=0.6
, the krakenuniq-build
command (above) doesn't encounter the same issue, but it does hit another issue:
Found jellyfish v1.1.12
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
Hash size not specified, using '32573424'
K-mer set created. [10.283s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 32716828 keys with k of 31 [val_len 4, key_len 8].
Loaded database with 32716828 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [4m18.787s]
Creating seqID to taxID map (step 4 of 6)..
686 sequences mapped to taxa. [0.056s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 240103 taxa
taxDB construction finished. [1.220s]
Building KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Getting database0.kdb into memory (374.42 MB) ... Done
Loaded database with 32716828 keys with k of 31 [val_len 4, key_len 8].
Reading sequence ID to taxonomy ID mapping ... got 686 mappings.
Finished processing 686 sequences (skipping 0 empty sequences, and 0 sequences with no taxonomy mapping)
Writing kmer counts to database.kdb.counts...
Writing database from RAM back to database.kdb ...
LCA database created. [22.837s]
Creating database summary report database.report.tsv ...
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify -d ././database.kdb -i ././database.idx -t 12 -r database.report.tsv -a ././taxDB -p 12 /dev/fd/62
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify: invalid option -- 'd'
Usage: classify [options] <fasta/fastq file(s)>
Options: (*mandatory)
* -H filename Kraken 2 index filename
* -t filename Kraken 2 taxonomy filename
* -o filename Kraken 2 options filename
-q Quick mode
-M Use memory mapping to access hash & taxonomy
-T NUM Confidence score threshold (def. 0)
-p NUM Number of threads (def. 1)
-Q NUM Minimum quality score (FASTQ only, def. 0)
-P Process pairs of reads
-S Process pairs with mates in same file
-R filename Print report to filename
-m In comb. w/ -R, use mpa-style report
-z In comb. w/ -R, report taxa w/ 0 count
-n Print scientific name instead of taxid in Kraken output
-g NUM Minimum number of hit groups needed for call
-C filename Filename/format to have classified sequences
-U filename Filename/format to have unclassified sequences
-O filename Output file for normal Kraken output
-K In comb. w/ -R, provide minimizer information in report
xargs: cat: terminated by signal 13
Database construction complete. [Total: 4m53.379s]
You can delete all files but database.{kdb,idx} and taxDB now, if you want
It appears that you are calling subcommands in perl (e.g., exec "build_db.sh"
) without checking whether the return is non-zero. Hence, xargs: cat: terminated by signal 13
can still result in Database construction complete. [Total: 4m53.379s]
The libexec/classify: invalid option -- 'd'
error also occurs when running krakenuniq=0.6
for classifying Illumina reads:
krakenuniq --threads 8 --db $DB --fastq-input --gzip-compressed $READS
...results in:
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify -d /tmp/global2/nyoungblut/code/dev/tmp/n100/krakenuniq//database.kdb -i /tmp/global2/nyoungblut/code/dev/tmp/n100/krakenuniq//database.idx -t 8 -f -a /tmp/global2/nyoungblut/code/dev/tmp/n100/krakenuniq//taxDB -p 12 /dev/fd/0
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify: invalid option -- 'd'
Usage: classify [options] <fasta/fastq file(s)>
Options: (*mandatory)
* -H filename Kraken 2 index filename
* -t filename Kraken 2 taxonomy filename
* -o filename Kraken 2 options filename
-q Quick mode
-M Use memory mapping to access hash & taxonomy
-T NUM Confidence score threshold (def. 0)
-p NUM Number of threads (def. 1)
-Q NUM Minimum quality score (FASTQ only, def. 0)
-P Process pairs of reads
-S Process pairs with mates in same file
-R filename Print report to filename
-m In comb. w/ -R, use mpa-style report
-z In comb. w/ -R, report taxa w/ 0 count
-n Print scientific name instead of taxid in Kraken output
-g NUM Minimum number of hit groups needed for call
-C filename Filename/format to have classified sequences
-U filename Filename/format to have unclassified sequences
-O filename Output file for normal Kraken output
-K In comb. w/ -R, provide minimizer information in report
The libexec/classify: invalid option -- 'd' error also occurs when running krakenuniq=0.6 for classifying Illumina reads:
If I then change back to krakenuniq=0.7
for the actual classification, then there is no libexec/classify: invalid option -- 'd'
error. So, it appears that currently krakenuniq=0.7
isn't working for the database build, but krakenuniq=0.6
isn't working for the classification
If I had to guess, the issue is due to either the build.sh and/or patches in the bioconda recipe.
Probably due to either the ./install_krakenuniq.sh "$PREFIX/libexec"
in the build.sh
, or due to:
for bin in krakenuniq krakenuniq-build krakenuniq-download krakenuniq-extract-reads krakenuniq-filter krakenuniq-mpa-report krakenuniq-report krakenuniq-translate read_merger.pl; do
chmod +x "$PREFIX/libexec/$bin"
ln -s "$PREFIX/libexec/$bin" "$PREFIX/bin/$bin"
# Change from double quotes to single in case of special chars
sed -i.bak "s#my \$KRAKEN_DIR = \"$PREFIX/libexec\";#my \$KRAKEN_DIR = '$PREFIX/libexec';#g" "$PREFIX/libexec/${bin}"
rm -rf "$PREFIX/libexec/${bin}.bak"
done
.. with no chmod +x
for count_unique
Thank you for reporting this issue. We identified a bug in the Makefile that resulted in unusable count_unique and set_lcas executables. We released a new version 0.7.1 that fixes the bug.
Thanks @alekseyzimin for the quick fix! That's great! š
I'm still getting the following error with bioconda::krakenuniq=0.7.3
:
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using /tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/4d0412795223eb03c04aea7214739086/libexec/jellyfish-install/bin/jellyfish
Hash size not specified, using '32573424'
/tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/4d0412795223eb03c04aea7214739086/libexec/build_db.sh: line 46: /tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/4d0412795223eb03c04aea7214739086/libexec/jellyfish-install/bin/jellyfish: No such file or directory
Adding executable permissions to
count_unique
did not help:conda env: