genotoul-bioinfo / Binette

A fast and accurate binning refinement tool to constructs high quality MAGs from the output of multiple binning tools.
https://binette.readthedocs.io
MIT License
19 stars 1 forks source link

DIAMOND failing #22

Closed beardymcjohnface closed 4 months ago

beardymcjohnface commented 4 months ago

For review openjournals/joss-reviews/issues/6782 Hi, This is probably related to #13. I'm unable to run binette on the test dataset. I installed using the bioconda install instructions and ran the test command.

macOS Sonoma v14.5 conda 23.11.0 Python 3.8.18

$ binette -b binning_results/*.binning --contigs all_contigs.fna --checkm2_db checkm2_tiny_db/checkm2_tiny_db.dmnd -v -o test_results
[2024-07-08 17:48:56] INFO - Program started
[2024-07-08 17:48:56] INFO - command line: /Users/a1234202/miniconda3/envs/binette/bin/binette -b binning_results/A.binning binning_results/B.binning binning_results/C.binning --contigs all_contigs.fna --checkm2_db checkm2_tiny_db/checkm2_tiny_db.dmnd -v -o test_results
[2024-07-08 17:48:56] INFO - Parsing bin2contig files.
[2024-07-08 17:48:56] INFO - 3 bin sets processed:
[2024-07-08 17:48:56] INFO -  A - 6 bins
[2024-07-08 17:48:56] INFO -  B - 3 bins
[2024-07-08 17:48:56] INFO -  C - 4 bins
[2024-07-08 17:48:56] INFO - Parsing contig fasta file: all_contigs.fna
[2024-07-08 17:48:56] INFO - Predicting cds sequences with Pyrodigal using 1 threads.
[2024-07-08 17:48:58] INFO - Writing predicted protein sequences.
[2024-07-08 17:48:58] INFO - Running diamond
[2024-07-08 17:48:58] INFO - diamond blastp --outfmt 6 --max-target-seqs 1 --query test_results/temporary_files/assembly_proteins.faa -o test_results/temporary_files/diamond_result.tsv --threads 1 --db checkm2_tiny_db/checkm2_tiny_db.dmnd --query-cover 80 --subject-cover 80 --id 30 --evalue 1e-05 --block-size 2 2> test_results/temporary_files/diamond_result.log
/bin/sh: line 1: 20164 Illegal instruction: 4  diamond blastp --outfmt 6 --max-target-seqs 1 --query test_results/temporary_files/assembly_proteins.faa -o test_results/temporary_files/diamond_result.tsv --threads 1 --db checkm2_tiny_db/checkm2_tiny_db.dmnd --query-cover 80 --subject-cover 80 --id 30 --evalue 1e-05 --block-size 2 2> test_results/temporary_files/diamond_result.log
[2024-07-08 17:48:59] ERROR - An error occurred while running DIAMOND. Check log file: test_results/temporary_files/diamond_result.log
$ cat test_results/temporary_files/diamond_result.log
diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: test_results/temporary_files
Opening the database...  [0s]
#Target sequences to report alignments for: 1
Reference = checkm2_tiny_db/checkm2_tiny_db.dmnd
Sequences = 1775
Letters = 592231
Block size = 2000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0.002s]
Masking queries...  [0.041s]
Building query seed set...  [0.003s]
Algorithm: Double-indexed
Building query histograms...  [0.008s]
Allocating buffers...  [0s]
Loading reference sequences...  [0s]
Masking reference...  [0.029s]
Initializing temporary storage...  [0.002s]
Building reference histograms...  [0.005s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array...  [0.006s]
Building query seed array...  [0.008s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.106s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
Building reference seed array...  [0.008s]
Building query seed array...  [0.012s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.099s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4.
Building reference seed array...  [0.01s]
Building query seed array...  [0.014s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.096s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4.
Building reference seed array...  [0.006s]
Building query seed array...  [0.008s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.095s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4.
Building reference seed array...  [0.006s]
Building query seed array...  [0.008s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.1s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4.
Building reference seed array...  [0.008s]
Building query seed array...  [0.012s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.096s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4.
Building reference seed array...  [0.01s]
Building query seed array...  [0.014s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.095s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4.
Building reference seed array...  [0.006s]
Building query seed array...  [0.008s]
Computing hash join...  [0.004s]
Building seed filter...  [0.001s]
Searching alignments...  [0.093s]
Deallocating buffers...  [0s]
Clearing query masking...  [0s]
Computing alignments... 
$ conda list
# packages in environment at /Users/a1234202/miniconda3/envs/binette:
#
# Name                    Version                   Build  Channel
abseil-cpp                20200923.3           h23ab428_0  
absl-py                   2.1.0            py38hecd8cb5_0  
aiohttp                   3.9.5            py38h6c40b1e_0  
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
archspec                  0.2.3              pyhd3eb1b0_0  
astor                     0.8.1            py38hecd8cb5_0  
astunparse                1.6.3                      py_0  
async-timeout             4.0.3            py38hecd8cb5_0  
attrs                     23.2.0             pyh71513ae_0    conda-forge
binette                   1.0.1              pyh7e72e81_0    bioconda
blas                      2.122                  openblas    conda-forge
blas-devel                3.9.0           22_osx64_openblas    conda-forge
blinker                   1.8.2              pyhd8ed1ab_0    conda-forge
boost-cpp                 1.70.0               hef959ae_3    conda-forge
brotli-python             1.1.0            py38h940360d_1    conda-forge
bzip2                     1.0.8                h6c40b1e_6  
c-ares                    1.28.1               h10d778d_0    conda-forge
ca-certificates           2024.7.4             h8857fd0_0    conda-forge
cachetools                4.2.4              pyhd8ed1ab_0    conda-forge
certifi                   2024.6.2         py38hecd8cb5_0  
cffi                      1.16.0           py38h6c40b1e_1  
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
checkm2                   1.0.2              pyh7cba7a3_0    bioconda
click                     8.1.7            py38hecd8cb5_0  
coverage                  7.5.4            py38hc718529_0    conda-forge
cryptography              41.0.3           py38ha2381d6_0  
cython                    3.0.10           py38h6c40b1e_0  
diamond                   2.0.4                h31d8819_0    bioconda
frozenlist                1.4.1            py38hae2e43d_0    conda-forge
gast                      0.3.3                      py_0  
giflib                    5.2.2                h10d778d_0    conda-forge
google-auth               1.35.0             pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyhd3eb1b0_0  
grpc-cpp                  1.36.4               h33525da_1    conda-forge
grpcio                    1.36.1           py38h97de6d8_1  
h5py                      2.10.0           py38h0601b69_1  
hdf5                      1.10.6               h10fe05b_1  
icu                       67.1                 hb1e8313_0    conda-forge
idna                      3.7              py38hecd8cb5_0  
importlib-metadata        8.0.0              pyha770c72_0    conda-forge
joblib                    1.4.2            py38hecd8cb5_0  
jpeg                      9e                   h6c40b1e_1  
keras-preprocessing       1.1.2              pyhd3eb1b0_0  
krb5                      1.20.1               hdba6334_1  
libblas                   3.9.0           22_osx64_openblas    conda-forge
libcblas                  3.9.0           22_osx64_openblas    conda-forge
libcurl                   8.2.1                ha585b31_0  
libcxx                    17.0.6               heb59cac_3    conda-forge
libedit                   3.1.20230828         h6c40b1e_0  
libev                     4.33                 h9ed2024_1  
libffi                    3.4.4                hecd8cb5_1  
libgfortran               5.0.0           13_2_0_h97931a8_3    conda-forge
libgfortran5              13.2.0               h2873a65_3    conda-forge
liblapack                 3.9.0           22_osx64_openblas    conda-forge
liblapacke                3.9.0           22_osx64_openblas    conda-forge
libnghttp2                1.52.0               h1c88b7d_1  
libopenblas               0.3.27          openmp_hfef2a42_0    conda-forge
libpng                    1.6.43               h92b6c6a_0    conda-forge
libprotobuf               3.15.8               hcf210ce_1    conda-forge
libsqlite                 3.46.0               h1b8f9f3_0    conda-forge
libssh2                   1.10.0               hdb2fb19_2  
libzlib                   1.2.13               h87427d6_6    conda-forge
lightgbm                  3.2.1            py38h23ab428_0  
llvm-openmp               18.1.8               h15ab845_0    conda-forge
lz4-c                     1.9.4                hcec6c5f_1  
markdown                  3.6                pyhd8ed1ab_0    conda-forge
markupsafe                2.1.5            py38hae2e43d_0    conda-forge
multidict                 6.0.5            py38hef030d1_0    conda-forge
ncurses                   6.5                  h5846eda_0    conda-forge
networkx                  3.1              py38hecd8cb5_0  
numpy                     1.19.2           py38hb9de1e1_1  
numpy-base                1.19.2           py38hf048a4f_1  
oauthlib                  3.2.2            py38hecd8cb5_0  
openblas                  0.3.27          openmp_h6794695_0    conda-forge
openssl                   1.1.1w               hca72f7f_0  
opt_einsum                3.3.0              pyhd3eb1b0_1  
packaging                 24.1             py38hecd8cb5_0  
pandas                    1.4.0            py38ha53d530_0    conda-forge
pip                       24.0             py38hecd8cb5_0  
prodigal                  2.6.3                h8af04d4_9    bioconda
protobuf                  3.15.8           py38ha048514_0    conda-forge
pyasn1                    0.6.0              pyhd8ed1ab_0    conda-forge
pyasn1-modules            0.4.0              pyhd8ed1ab_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pyfastx                   2.1.0            py38h019ace8_2    bioconda
pyjwt                     2.8.0            py38hecd8cb5_0  
pyopenssl                 23.2.0           py38hecd8cb5_0  
pyrodigal                 3.4.1            py38h51c4a30_1    bioconda
pysocks                   1.7.1                    py38_1  
python                    3.8.18               h218abb5_0  
python-dateutil           2.9.0post0       py38hecd8cb5_2  
python-flatbuffers        1.12               pyhd3eb1b0_0  
python_abi                3.8                      2_cp38    conda-forge
pytz                      2024.1           py38hecd8cb5_0  
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
re2                       2021.04.01           he49afe7_0    conda-forge
readline                  8.2                  hca72f7f_0  
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
requests-oauthlib         2.0.0              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scikit-learn              0.23.2           py38h959d312_0  
scipy                     1.9.3            py38h9034365_2  
setuptools                70.1.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyhd3eb1b0_1  
snappy                    1.1.10               hcec6c5f_1  
sqlite                    3.46.0               h28673e1_0    conda-forge
tensorboard               2.4.1              pyhd8ed1ab_1    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.4.0            py38h50d1736_0    conda-forge
tensorflow-base           2.4.0            py38h428766a_0    conda-forge
tensorflow-estimator      2.4.0              pyh9656e83_0    conda-forge
termcolor                 2.4.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.5.0            py38h20db666_0  
tk                        8.6.14               h4d00af3_0  
tomli                     2.0.1            py38hecd8cb5_0  
tqdm                      4.66.4           py38h20db666_0  
typing_extensions         4.12.2             pyha770c72_0    conda-forge
urllib3                   2.2.2            py38hecd8cb5_0  
werkzeug                  3.0.3            py38hecd8cb5_0  
wheel                     0.43.0           py38hecd8cb5_0  
wrapt                     1.16.0           py38hae2e43d_0    conda-forge
xz                        5.4.6                h6c40b1e_1  
yarl                      1.9.4            py38hae2e43d_0    conda-forge
zipp                      3.19.2             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h87427d6_6    conda-forge
zstd                      1.4.9                h322a384_0  
JeanMainguy commented 4 months ago

Hi,

Unfortunately, I'm not able to reproduce the error on my end.

I ran the CI on a macOS runner via GitHub to see if the problem might be OS-related. The CI passes without any issues (check it out here: CI Run). However, the GitHub macOS version is 13, which is lower than your version.

The error in your log seems to come from this line:

/bin/sh: line 1: 20164 Illegal instruction: 4  diamond blastp --outfmt 6 --max-target-seqs 1 --query test_results/temporary_files/assembly_proteins.faa -o test_results/temporary_files/diamond_result.tsv --threads 1 --db checkm2_tiny_db/checkm2_tiny_db.dmnd --query-cover 80 --subject-cover 80 --id 30 --evalue 1e-05 --block-size 2 2> test_results/temporary_files/diamond_result.log

I found this Stack Overflow question: What is the illegal instruction 4 error and why does it happen? and this GitHub issue: DIAMOND Issue #303.

It seems the issue might be related to the DIAMOND conda package not being compatible with your version of macOS.

JeanMainguy commented 4 months ago

You might want to try installing a Binette environment with a more recent version of DIAMOND. Here's how you can do it using pip within a conda environment:

First, create a conda environment with Python 3.8 and DIAMOND 2.1.9:

conda create -n binette_with_diamond2.1.9 python=3.8 diamond=2.1.9
conda activate binette_with_diamond2.1.9

Then, install Binette using pip with the main_deps flag to ensure all dependencies are installed:

pip install binette[main_deps]

This will install all the necessary Python packages without checking the DIAMOND version, which is locked to 2.0.4 in the conda package as mentioned in #13.

Once you have everything set up, you can retry the Binette test as you did before:

git clone https://github.com/genotoul-bioinfo/Binette_TestData.git

cd Binette_TestData
binette -b binning_results/*binning --contigs all_contigs.fna --checkm2_db checkm2_tiny_db/checkm2_tiny_db.dmnd -v -o test_results
beardymcjohnface commented 4 months ago

Hi Jean, That makes sense. I have not tried the instructions for installing a newer version of diamond, but Binette works perfectly on my linux system!