bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
86 stars 17 forks source link

Update H. flu database to new sklearn version #276

Open conmeehan opened 11 months ago

conmeehan commented 11 months ago

Versions poppunk 2.6.0 zsh: command not found: poppunk_sketch poppunk_assign 2.6.0

Conda list:

packages in environment at /Users/cmeehan/opt/miniconda3/envs/poppunk:

#

Name Version Build Channel

aom 3.5.0 hf0c8a7f_0 conda-forge atk-1.0 2.38.0 h1d18e73_1 conda-forge biopython 1.81 py310h90acd4f_0 conda-forge boost 1.78.0 py310h3e792ce_4 conda-forge boost-cpp 1.78.0 hf5ba120_3 conda-forge brotli 1.0.9 hb7f2c08_9 conda-forge brotli-bin 1.0.9 hb7f2c08_9 conda-forge brotli-python 1.0.9 py310h7a76584_9 conda-forge bzip2 1.0.8 h0d85af4_4 conda-forge c-ares 1.19.1 h0dc2134_0 conda-forge ca-certificates 2023.5.7 h8857fd0_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cairo 1.16.0 h09dd18c_1016 conda-forge cairomm-1.0 1.14.4 h5b44118_1 conda-forge certifi 2023.5.7 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py310ha78151a_3 conda-forge charset-normalizer 3.2.0 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge contourpy 1.1.0 py310h88cfcbd_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cython 3.0.0 py310h9e9d8ca_0 conda-forge dav1d 1.2.1 h0dc2134_0 conda-forge dendropy 4.6.1 pyhdfd78af_0 bioconda docopt 0.6.2 py_1 conda-forge epoxy 1.5.10 h5eb16cf_1 conda-forge expat 2.5.0 hf0c8a7f_1 conda-forge ffmpeg 6.0.0 gpl_h74aebd8_103 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.14.2 h5bb23bf_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.41.0 py310h6729b98_0 conda-forge freetype 2.12.1 h3f81eb7_1 conda-forge fribidi 1.0.10 hbcb3906_0 conda-forge gdk-pixbuf 2.42.10 hff535ac_2 conda-forge gettext 0.21.1 h8a4c099_0 conda-forge gfortran_impl_osx-64 12.2.0 h158f68b_31 conda-forge glib-tools 2.76.4 h7d26f99_0 conda-forge gmp 6.2.1 h2e338ed_0 conda-forge gnutls 3.7.8 h207c4f0_0 conda-forge graph-tool 2.48 py310h6327fc9_0 conda-forge graph-tool-base 2.48 py310h65f7fc8_0 conda-forge graphite2 1.3.13 h2e338ed_1001 conda-forge gtk3 3.24.38 h5a9695a_0 conda-forge h5py 3.8.0 nompi_py310h5555e59_100 conda-forge harfbuzz 7.3.0 h413ba03_0 conda-forge hdbscan 0.8.29 py310h936d966_2 conda-forge hdf5 1.12.2 nompi_h48135f9_101 conda-forge hicolor-icon-theme 0.17 h694c41f_2 conda-forge icu 72.1 h7336db1_0 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge isl 0.25 hb486fe8_0 conda-forge joblib 1.3.0 pyhd8ed1ab_1 conda-forge kiwisolver 1.4.4 py310ha23aa8a_1 conda-forge krb5 1.21.1 hb884880_0 conda-forge lame 3.100 hb7f2c08_1003 conda-forge lcms2 2.15 h2dcdeff_1 conda-forge lerc 4.0.0 hb486fe8_0 conda-forge libaec 1.0.6 hf0c8a7f_1 conda-forge libass 0.17.1 h66d2fa1_0 conda-forge libblas 3.9.0 17_osx64_openblas conda-forge libbrotlicommon 1.0.9 hb7f2c08_9 conda-forge libbrotlidec 1.0.9 hb7f2c08_9 conda-forge libbrotlienc 1.0.9 hb7f2c08_9 conda-forge libcblas 3.9.0 17_osx64_openblas conda-forge libcurl 8.1.2 h5f667d7_1 conda-forge libcxx 16.0.6 hd57cbcb_0 conda-forge libdeflate 1.18 hac1461d_0 conda-forge libedit 3.1.20191231 h0678c8f_2 conda-forge libev 4.33 haf1e3a3_1 conda-forge libexpat 2.5.0 hf0c8a7f_1 conda-forge libffi 3.4.2 h0d85af4_5 conda-forge libgfortran 5.0.0 11_3_0_h97931a8_31 conda-forge libgfortran-devel_osx-64 12.2.0 hf0fd499_31 conda-forge libgfortran5 12.2.0 he409387_31 conda-forge libgirepository 1.76.1 he30e17e_0 conda-forge libglib 2.76.4 hc62aa5d_0 conda-forge libiconv 1.17 hac89ed1_0 conda-forge libidn2 2.3.4 hb7f2c08_0 conda-forge libjpeg-turbo 2.1.5.1 hb7f2c08_0 conda-forge liblapack 3.9.0 17_osx64_openblas conda-forge libnghttp2 1.52.0 he2ab024_0 conda-forge libopenblas 0.3.23 openmp_h429af6e_0 conda-forge libopus 1.3.1 hc929b4f_1 conda-forge libpng 1.6.39 ha978bb4_0 conda-forge librsvg 2.56.1 hec3db73_0 conda-forge libsqlite 3.42.0 h58db7d2_0 conda-forge libssh2 1.11.0 hd019ec5_0 conda-forge libtasn1 4.19.0 hb7f2c08_0 conda-forge libtiff 4.5.1 hf955e92_0 conda-forge libunistring 0.9.10 h0d85af4_0 conda-forge libvpx 1.13.0 hf0c8a7f_0 conda-forge libwebp-base 1.3.1 h0dc2134_0 conda-forge libxcb 1.15 hb7f2c08_0 conda-forge libxml2 2.11.4 hd95e348_0 conda-forge libzlib 1.2.13 h8a1eda9_5 conda-forge llvm-openmp 16.0.6 hff08bdf_0 conda-forge mandrake 1.2.2 py310heea2105_2 conda-forge matplotlib-base 3.7.2 py310h475a17b_0 conda-forge mpc 1.3.1 h81bd1dd_0 conda-forge mpfr 4.2.0 h4f9bd69_0 conda-forge munkres 1.0.7 py_1 bioconda ncurses 6.4 hf0c8a7f_0 conda-forge nettle 3.8.1 h96f3785_1 conda-forge networkx 3.1 pyhd8ed1ab_0 conda-forge numpy 1.25.1 py310h7451ae0_0 conda-forge openblas 0.3.23 openmp_hbefa662_0 conda-forge openh264 2.3.1 hf0c8a7f_2 conda-forge openjpeg 2.5.0 h13ac156_2 conda-forge openssl 3.1.1 h8a1eda9_1 conda-forge p11-kit 0.24.1 h65f8906_0 conda-forge packaging 23.1 pyhd8ed1ab_0 conda-forge pandas 2.0.3 py310h5e4fcda_1 conda-forge pango 1.50.14 hbce5e75_1 conda-forge pcre2 10.40 h1c4e4bc_0 conda-forge pillow 10.0.0 py310hd63a8c7_0 conda-forge pip 23.2 pyhd8ed1ab_0 conda-forge pixman 0.40.0 hbcb3906_0 conda-forge platformdirs 3.9.1 pyhd8ed1ab_0 conda-forge plotly 5.15.0 pyhd8ed1ab_0 conda-forge pooch 1.7.0 pyha770c72_3 conda-forge poppunk 2.6.0 py310h4862987_1 bioconda pp-sketchlib 2.1.1 py310hda06942_1 conda-forge pthread-stubs 0.4 hc929b4f_1001 conda-forge pycairo 1.24.0 py310h0b97775_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pygobject 3.44.1 py310ha8dcd3d_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.10.12 had23ca6_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-tzdata 2023.3 pyhd8ed1ab_0 conda-forge python_abi 3.10 3_cp310 conda-forge pytz 2023.3 pyhd8ed1ab_0 conda-forge rapidnj 2.3.2 h85dcccf_4 bioconda readline 8.2 h9e318b2_1 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge scikit-learn 1.3.0 py310hd2c063c_0 conda-forge scipy 1.11.1 py310h3900cf1_0 conda-forge setuptools 68.0.0 pyhd8ed1ab_0 conda-forge sigcpp-2.0 2.10.8 hf0c8a7f_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sparsehash 2.0.4 hf0c8a7f_1 conda-forge svt-av1 1.6.0 he965462_0 conda-forge tenacity 8.2.2 pyhd8ed1ab_0 conda-forge threadpoolctl 3.2.0 pyha21a80b_0 conda-forge tk 8.6.12 h5dbffcc_0 conda-forge tqdm 4.65.0 pyhd8ed1ab_1 conda-forge treeswift 1.1.37 pyh7cba7a3_0 bioconda typing-extensions 4.7.1 hd8ed1ab_0 conda-forge typing_extensions 4.7.1 pyha770c72_0 conda-forge tzdata 2023c h71feb2d_0 conda-forge unicodedata2 15.0.0 py310h90acd4f_0 conda-forge urllib3 2.0.3 pyhd8ed1ab_1 conda-forge wheel 0.40.0 pyhd8ed1ab_1 conda-forge x264 1!164.3095 h775f41a_2 conda-forge x265 3.5 hbb4e6a2_3 conda-forge xorg-compositeproto 0.4.2 h0d85af4_1001 conda-forge xorg-damageproto 1.2.1 h0d85af4_1002 conda-forge xorg-fixesproto 5.0 h0d85af4_1002 conda-forge xorg-inputproto 2.3.2 h35c211d_1002 conda-forge xorg-kbproto 1.0.7 h35c211d_1002 conda-forge xorg-libice 1.0.10 h0d85af4_0 conda-forge xorg-libsm 1.2.3 h0d85af4_1000 conda-forge xorg-libx11 1.8.6 hbd0b022_0 conda-forge xorg-libxau 1.0.11 h0dc2134_0 conda-forge xorg-libxaw 1.0.14 h0d85af4_1 conda-forge xorg-libxcomposite 0.4.6 hb7f2c08_1 conda-forge xorg-libxcursor 1.2.0 hb7f2c08_1 conda-forge xorg-libxdamage 1.1.5 h0d85af4_1 conda-forge xorg-libxdmcp 1.1.3 h35c211d_0 conda-forge xorg-libxext 1.3.4 hb7f2c08_2 conda-forge xorg-libxfixes 5.0.3 h0d85af4_1004 conda-forge xorg-libxi 1.7.10 h0d85af4_0 conda-forge xorg-libxinerama 1.1.5 hf0c8a7f_0 conda-forge xorg-libxmu 1.1.3 h0d85af4_0 conda-forge xorg-libxpm 3.5.16 h0dc2134_0 conda-forge xorg-libxrandr 1.5.2 h0d85af4_1 conda-forge xorg-libxrender 0.9.11 h0dc2134_0 conda-forge xorg-libxt 1.3.0 h0dc2134_0 conda-forge xorg-randrproto 1.5.0 h0d85af4_1001 conda-forge xorg-renderproto 0.11.1 h0d85af4_1002 conda-forge xorg-util-macros 1.19.3 h35c211d_0 conda-forge xorg-xextproto 7.3.0 hb7f2c08_1003 conda-forge xorg-xproto 7.0.31 h35c211d_1007 conda-forge xz 5.2.6 h775f41a_0 conda-forge zlib 1.2.13 h8a1eda9_5 conda-forge zstandard 0.19.0 py310h151724a_2 conda-forge zstd 1.5.2 h829000d_7 conda-forge

Command used and output returned poppunk_assign --db Haemophilus_influenzae_v1_refs --query input.txt --output poppunk_clusters --threads 7 Input.txt: AP022846 AP022846.1.fa SRR11108932 SRR11108932_1.fastq.gz SRR11108932_1.fastq.gz

Describe the bug Get the following error when running on Apple M1 macOS 13.4.1 16GB memory:

PopPUNK: assign (with backend: sketchlib v2.1.1 sketchlib: /Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/pp_sketchlib.cpython-310-darwin.so) Mode: Assigning clusters of query sequences

Graph-tools OpenMP parallelisation enabled: with 7 threads Sketching 1 genomes using 1 thread(s) Progress (CPU): 1 / 1 Writing sketches to file Traceback (most recent call last): File "/Users/cmeehan/opt/miniconda3/envs/poppunk/bin/poppunk_assign", line 11, in sys.exit(main()) File "/Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/PopPUNK/assign.py", line 211, in main assign_query(dbFuncs, File "/Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/PopPUNK/assign.py", line 307, in assign_query isolateClustering = assign_query_hdf5(dbFuncs, File "/Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/PopPUNK/assign.py", line 357, in assign_queryhdf5 from .models import loadClusterFit File "/Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/PopPUNK/models.py", line 19, in import hdbscan File "/Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/hdbscan/init.py", line 1, in from .hdbscan import HDBSCAN, hdbscan File "/Users/cmeehan/opt/miniconda3/envs/poppunk/lib/python3.10/site-packages/hdbscan/hdbscan_.py", line 40, in FAST_METRICS = KDTree.valid_metrics + BallTree.valid_metrics + ["cosine", "arccos"] TypeError: unsupported operand type(s) for +: 'builtin_function_or_method' and 'builtin_function_or_method'

Note: Ran on an UBUNTU server and do not get this error.

johnlees commented 11 months ago

Sorry about this, I think this looks like it's due to scikit-learn changing their API, which I couldn't make backwards compatible, see: https://github.com/bacpop/PopPUNK#2022-08-04

The change in scikit-learn's API in v1.0.0 and above mean that HDBSCAN models fitted with sklearn <=v0.24 will give an error when loaded. If you run into this, the solution is one of:

  • Downgrade sklearn to v0.24.
  • Run model refinement to turn your model into a boundary model instead (this will change clusters).
  • Refit your model in an environment with sklearn >=v1.0.

If this is a common problem let us know, as we could write a script to 'upgrade' HDBSCAN models. See issue https://github.com/bacpop/PopPUNK/issues/213 for more details.

Was the Haemophilus_influenzae_v1_refs database from our website? I should update it to fix this if so

conmeehan commented 11 months ago

Ah sorry, I didnt see that bit in the README. Surprised that it worked on the Ubuntu box, must have used a different scikit-learn and I just didn't notice.

The database was from your website, yes. I didn't try any other ones, just that one.

C