fbreitwieser / krakenuniq

šŸ™ KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
217 stars 43 forks source link

count_unique: Permission denied #95

Open nick-youngblut opened 2 years ago

nick-youngblut commented 2 years ago
$ krakenuniq-build --build --threads 12 --db $DB
Found jellyfish v1.1.12
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
/tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/44f848fc5cdb9add9848e9cbcae49e5c/libexec/build_db.sh: line 127: /tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/44f848fc5cdb9add9848e9cbcae49e5c/libexec/count_unique: Permission denied

Adding executable permissions to count_unique did not help:

$ krakenuniq-build --build --threads 12 --db $DB
Found jellyfish v1.1.12
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
/tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/44f848fc5cdb9add9848e9cbcae49e5c/libexec/build_db.sh: line 127: /tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/44f848fc5cdb9add9848e9cbcae49e5c/libexec/count_unique: cannot execute binary file: Exec format error
xargs: cat: terminated by signal 13

conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
blast                     2.12.0               hf3cf87c_4    bioconda
bracken                   2.7              py39hc16433a_0    bioconda
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.5.18.1          ha878542_0    conda-forge
curl                      7.83.1               h7bff187_0    conda-forge
entrez-direct             16.2                 he881be0_0    bioconda
gettext                   0.19.8.1          h73d1719_1008    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kmer-jellyfish            1.1.12               h9f5acd7_2    bioconda
kraken2                   2.1.2           pl5321h9f5acd7_2    bioconda
krakenuniq                0.7             pl5321h19e8d03_0    bioconda
krb5                      1.19.3               h3790be6_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libcurl                   7.83.1               h7bff187_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libidn2                   2.3.2                h7f98852_0    conda-forge
libnghttp2                1.47.0               h727a467_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libzlib                   1.2.12               h166bdaf_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
openssl                   1.1.1o               h166bdaf_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
perl                      5.32.1          2_h7f98852_perl5    conda-forge
perl-archive-tar          2.40            pl5321hdfd78af_0    bioconda
perl-base                 2.23            pl5321hdfd78af_2    bioconda
perl-business-isbn        3.007           pl5321hdfd78af_0    bioconda
perl-business-isbn-data   20210112.006    pl5321hdfd78af_0    bioconda
perl-carp                 1.38            pl5321hdfd78af_4    bioconda
perl-common-sense         3.75            pl5321hdfd78af_0    bioconda
perl-compress-raw-bzip2   2.103           pl5321h87f3376_0    bioconda
perl-compress-raw-zlib    2.105           pl5321h87f3376_0    bioconda
perl-constant             1.33            pl5321hdfd78af_2    bioconda
perl-data-dumper          2.183           pl5321hec16e2b_1    bioconda
perl-digest-hmac          1.04            pl5321hdfd78af_0    bioconda
perl-digest-md5           2.58            pl5321hec16e2b_1    bioconda
perl-encode               3.17            pl5321hec16e2b_0    bioconda
perl-encode-locale        1.05            pl5321hdfd78af_7    bioconda
perl-exporter             5.72            pl5321hdfd78af_2    bioconda
perl-exporter-tiny        1.002002        pl5321hdfd78af_0    bioconda
perl-extutils-makemaker   6.66                          0    bioconda
perl-file-listing         6.15            pl5321hdfd78af_0    bioconda
perl-file-spec            3.48_01         pl5321hdfd78af_2    bioconda
perl-html-parser          3.78            pl5321h9f5acd7_0    bioconda
perl-html-tagset          3.20            pl5321hdfd78af_4    bioconda
perl-http-cookies         6.10            pl5321hdfd78af_0    bioconda
perl-http-daemon          6.14            pl5321hdfd78af_0    bioconda
perl-http-date            6.05            pl5321hdfd78af_0    bioconda
perl-http-message         6.36            pl5321hdfd78af_0    bioconda
perl-http-negotiate       6.01            pl5321hdfd78af_4    bioconda
perl-io-compress          2.106           pl5321h87f3376_0    bioconda
perl-io-html              1.004           pl5321hdfd78af_0    bioconda
perl-io-socket-ssl        2.074           pl5321hdfd78af_0    bioconda
perl-io-zlib              1.11            pl5321hdfd78af_0    bioconda
perl-json                 4.06            pl5321hdfd78af_0    bioconda
perl-json-xs              2.34            pl5321h9f5acd7_5    bioconda
perl-libwww-perl          6.66            pl5321hdfd78af_0    bioconda
perl-list-moreutils       0.430           pl5321hdfd78af_0    bioconda
perl-list-moreutils-xs    0.430           pl5321hec16e2b_1    bioconda
perl-lwp-mediatypes       6.04            pl5321hdfd78af_1    bioconda
perl-lwp-protocol-https   6.10            pl5321hdfd78af_0    bioconda
perl-mime-base64          3.16            pl5321hec16e2b_2    bioconda
perl-mozilla-ca           20211001        pl5321hdfd78af_0    bioconda
perl-net-http             6.22            pl5321hdfd78af_0    bioconda
perl-net-ssleay           1.92            pl5321h0e0aaa8_1    bioconda
perl-ntlm                 1.09            pl5321hdfd78af_5    bioconda
perl-parent               0.236           pl5321hdfd78af_2    bioconda
perl-pathtools            3.75            pl5321hec16e2b_3    bioconda
perl-scalar-list-utils    1.62            pl5321hec16e2b_0    bioconda
perl-socket               2.027           pl5321hec16e2b_3    bioconda
perl-test-requiresinternet 0.05            pl5321hdfd78af_1    bioconda
perl-threaded             5.32.1               hdfd78af_1    bioconda
perl-time-local           1.30            pl5321hdfd78af_0    bioconda
perl-timedate             2.33            pl5321hdfd78af_2    bioconda
perl-try-tiny             0.31            pl5321hdfd78af_0    bioconda
perl-types-serialiser     1.01            pl5321hdfd78af_0    bioconda
perl-uri                  5.10            pl5321hdfd78af_0    bioconda
perl-url-encode           0.03            pl5321h9ee0642_0    bioconda
perl-www-robotrules       6.02            pl5321hdfd78af_4    bioconda
pip                       22.1.2             pyhd8ed1ab_0    conda-forge
popt                      1.16                          1    bioconda
python                    3.9.13          h9a8a25e_0_cpython    conda-forge
python_abi                3.9                      2_cp39    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
rsync                     3.2.3                hfa40b15_4    conda-forge
setuptools                62.3.2           py39hf3d152e_0    conda-forge
sqlite                    3.38.5               h4ff8645_0    conda-forge
tar                       1.34                 ha1f6473_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tzdata                    2022a                h191b570_0    conda-forge
wget                      1.20.3               ha56f1ee_1    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xxhash                    0.8.0                h7f98852_3    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.12               h166bdaf_0    conda-forge
zstd                      1.5.2                h8a70e8d_1    conda-forge
nick-youngblut commented 2 years ago

This issue is specific to krakenuniq=0.7. If I use krakenuniq=0.6, the krakenuniq-build command (above) doesn't encounter the same issue, but it does hit another issue:

Found jellyfish v1.1.12
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
Hash size not specified, using '32573424'
K-mer set created. [10.283s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 32716828 keys with k of 31 [val_len 4, key_len 8].
Loaded database with 32716828 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [4m18.787s]
Creating seqID to taxID map (step 4 of 6)..
686 sequences mapped to taxa. [0.056s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 240103 taxa
taxDB construction finished. [1.220s]
Building  KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Getting database0.kdb into memory (374.42 MB) ... Done
Loaded database with 32716828 keys with k of 31 [val_len 4, key_len 8].
Reading sequence ID to taxonomy ID mapping ...  got 686 mappings.
Finished processing 686 sequences (skipping 0 empty sequences, and 0 sequences with no taxonomy mapping)
Writing kmer counts to database.kdb.counts...
Writing database from RAM back to database.kdb ...
LCA database created. [22.837s]
Creating database summary report database.report.tsv ...
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify -d ././database.kdb -i ././database.idx -t 12 -r database.report.tsv -a ././taxDB -p 12 /dev/fd/62
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify: invalid option -- 'd'
Usage: classify [options] <fasta/fastq file(s)>

Options: (*mandatory)
* -H filename      Kraken 2 index filename
* -t filename      Kraken 2 taxonomy filename
* -o filename      Kraken 2 options filename
  -q               Quick mode
  -M               Use memory mapping to access hash & taxonomy
  -T NUM           Confidence score threshold (def. 0)
  -p NUM           Number of threads (def. 1)
  -Q NUM           Minimum quality score (FASTQ only, def. 0)
  -P               Process pairs of reads
  -S               Process pairs with mates in same file
  -R filename      Print report to filename
  -m               In comb. w/ -R, use mpa-style report
  -z               In comb. w/ -R, report taxa w/ 0 count
  -n               Print scientific name instead of taxid in Kraken output
  -g NUM           Minimum number of hit groups needed for call
  -C filename      Filename/format to have classified sequences
  -U filename      Filename/format to have unclassified sequences
  -O filename      Output file for normal Kraken output
  -K               In comb. w/ -R, provide minimizer information in report
xargs: cat: terminated by signal 13
Database construction complete. [Total: 4m53.379s]
You can delete all files but database.{kdb,idx} and taxDB now, if you want

It appears that you are calling subcommands in perl (e.g., exec "build_db.sh") without checking whether the return is non-zero. Hence, xargs: cat: terminated by signal 13 can still result in Database construction complete. [Total: 4m53.379s]

nick-youngblut commented 2 years ago

The libexec/classify: invalid option -- 'd' error also occurs when running krakenuniq=0.6 for classifying Illumina reads:

krakenuniq --threads 8 --db $DB --fastq-input --gzip-compressed $READS

...results in:

/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify -d /tmp/global2/nyoungblut/code/dev/tmp/n100/krakenuniq//database.kdb -i /tmp/global2/nyoungblut/code/dev/tmp/n100/krakenuniq//database.idx -t 8 -f -a /tmp/global2/nyoungblut/code/dev/tmp/n100/krakenuniq//taxDB -p 12 /dev/fd/0
/tmp/global2/nyoungblut/code/dev/ll_pipelines/llmgp/tmp/krakenuniq/libexec/classify: invalid option -- 'd'
Usage: classify [options] <fasta/fastq file(s)>

Options: (*mandatory)
* -H filename      Kraken 2 index filename
* -t filename      Kraken 2 taxonomy filename
* -o filename      Kraken 2 options filename
  -q               Quick mode
  -M               Use memory mapping to access hash & taxonomy
  -T NUM           Confidence score threshold (def. 0)
  -p NUM           Number of threads (def. 1)
  -Q NUM           Minimum quality score (FASTQ only, def. 0)
  -P               Process pairs of reads
  -S               Process pairs with mates in same file
  -R filename      Print report to filename
  -m               In comb. w/ -R, use mpa-style report
  -z               In comb. w/ -R, report taxa w/ 0 count
  -n               Print scientific name instead of taxid in Kraken output
  -g NUM           Minimum number of hit groups needed for call
  -C filename      Filename/format to have classified sequences
  -U filename      Filename/format to have unclassified sequences
  -O filename      Output file for normal Kraken output
  -K               In comb. w/ -R, provide minimizer information in report
nick-youngblut commented 2 years ago

The libexec/classify: invalid option -- 'd' error also occurs when running krakenuniq=0.6 for classifying Illumina reads:

If I then change back to krakenuniq=0.7 for the actual classification, then there is no libexec/classify: invalid option -- 'd' error. So, it appears that currently krakenuniq=0.7 isn't working for the database build, but krakenuniq=0.6 isn't working for the classification

nick-youngblut commented 2 years ago

If I had to guess, the issue is due to either the build.sh and/or patches in the bioconda recipe.

Probably due to either the ./install_krakenuniq.sh "$PREFIX/libexec" in the build.sh, or due to:

for bin in krakenuniq krakenuniq-build krakenuniq-download krakenuniq-extract-reads krakenuniq-filter krakenuniq-mpa-report krakenuniq-report krakenuniq-translate read_merger.pl; do
    chmod +x "$PREFIX/libexec/$bin"
    ln -s "$PREFIX/libexec/$bin" "$PREFIX/bin/$bin"
    # Change from double quotes to single in case of special chars
    sed -i.bak "s#my \$KRAKEN_DIR = \"$PREFIX/libexec\";#my \$KRAKEN_DIR = '$PREFIX/libexec';#g" "$PREFIX/libexec/${bin}"
    rm -rf "$PREFIX/libexec/${bin}.bak"
done

.. with no chmod +x for count_unique

alekseyzimin commented 2 years ago

Thank you for reporting this issue. We identified a bug in the Makefile that resulted in unusable count_unique and set_lcas executables. We released a new version 0.7.1 that fixes the bug.

nick-youngblut commented 2 years ago

Thanks @alekseyzimin for the quick fix! That's great! šŸŽ‰

nick-youngblut commented 1 year ago

I'm still getting the following error with bioconda::krakenuniq=0.7.3:

Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using /tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/4d0412795223eb03c04aea7214739086/libexec/jellyfish-install/bin/jellyfish
Hash size not specified, using '32573424'
/tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/4d0412795223eb03c04aea7214739086/libexec/build_db.sh: line 46: /tmp/global2/nyoungblut/code/dev/Struo2/.snakemake/conda/4d0412795223eb03c04aea7214739086/libexec/jellyfish-install/bin/jellyfish: No such file or directory