bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

Problem in assigning lineages - poppunk_assign crashing #176

Closed andreaniml closed 2 years ago

andreaniml commented 3 years ago

Versions " conda list" output :

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             4.5                       1_gnu
apscheduler               3.7.0            py39h06a4308_0
at-spi2-atk               2.38.0               h0630a04_3    conda-forge
at-spi2-core              2.40.3               h0630a04_0    conda-forge
atk-1.0                   2.36.0               h28cd5cc_0
boost                     1.74.0           py39h5472131_3    conda-forge
boost-cpp                 1.74.0               hc6e9bd1_3    conda-forge
brotlipy                  0.7.0           py39h27cfd23_1003
bzip2                     1.0.8                h7b6447c_0
c-ares                    1.17.1               h27cfd23_0
ca-certificates           2021.7.5             h06a4308_1
cached-property           1.5.2                      py_0
cairo                     1.16.0               hf32fb01_1
cairomm                   1.12.2               ha770c72_3    conda-forge
cairomm-1.0               1.12.2               h56b4340_3    conda-forge
certifi                   2021.5.30        py39h06a4308_0
cffi                      1.14.6           py39h400218f_0
chardet                   4.0.0           py39h06a4308_1003
click                     8.0.1              pyhd3eb1b0_0
cryptography              3.4.7            py39hd23ed53_0
cycler                    0.10.0           py39h06a4308_0
dbus                      1.13.18              hb2f20db_0
decorator                 4.4.2              pyhd3eb1b0_0
dendropy                  4.5.2              pyh3252c3a_0    bioconda
epoxy                     1.5.8                h7f98852_0    conda-forge
expat                     2.4.1                h2531618_2
flask                     1.1.2              pyhd3eb1b0_0
flask-apscheduler         1.12.2             pyhd3eb1b0_0
flask-cors                3.0.10             pyhd3eb1b0_0
font-ttf-dejavu-sans-mono 2.37                 h6964260_0
font-ttf-inconsolata      2.001                hcb22688_0
font-ttf-source-code-pro  2.030                h7457263_0
font-ttf-ubuntu           0.83                 h8b1ccd4_0
fontconfig                2.13.1            hba837de_1005    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.10.4               h5ab3b9f_0
fribidi                   1.0.10               h7b6447c_0
gdk-pixbuf                2.42.6               h04a7f16_0    conda-forge
gettext                   0.21.0               hf68c758_0
glib                      2.68.3               h9c3ff4c_0    conda-forge
glib-tools                2.68.3               h9c3ff4c_0    conda-forge
gmp                       6.2.1                h2531618_2
gobject-introspection     1.68.0           py39h2109141_1
graph-tool                2.43             py39hc4320a7_0    conda-forge
graph-tool-base           2.43             py39h8160539_0    conda-forge
graphite2                 1.3.14               h23475e2_0
gtk3                      3.24.29              h8879c87_1    conda-forge
gunicorn                  20.1.0           py39h06a4308_0
h5py                      3.2.1            py39h6c542dc_0
harfbuzz                  2.8.2                h83ec7ef_0    conda-forge
hdbscan                   0.8.27           py39hce5d2b2_0    conda-forge
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
hicolor-icon-theme        0.17                 ha770c72_2    conda-forge
icu                       68.1                 h2531618_0
idna                      2.10               pyhd3eb1b0_0
importlib-metadata        3.10.0           py39h06a4308_0
itsdangerous              2.0.1              pyhd3eb1b0_0
jinja2                    3.0.1              pyhd3eb1b0_0
joblib                    1.0.1              pyhd3eb1b0_0
jpeg                      9d                   h36c2ea0_0    conda-forge
kiwisolver                1.3.1            py39h2531618_0
krb5                      1.19.1               h3535a68_0
lcms2                     2.12                 h3be6417_0
ld_impl_linux-64          2.35.1               h7274673_9
libblas                   3.9.0                9_openblas    conda-forge
libcblas                  3.9.0                9_openblas    conda-forge
libcups                   2.3.3                hf5a7f15_0    conda-forge
libcurl                   7.77.0               h2574ce0_0    conda-forge
libedit                   3.1.20210216         h27cfd23_1
libev                     4.33                 h7b6447c_0
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5101ec6_17
libgfortran-ng            9.3.0               ha5ec8a7_17
libgfortran5              9.3.0               ha5ec8a7_17
libglib                   2.68.3               h3e27bee_0    conda-forge
libgomp                   9.3.0               h5101ec6_17
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0                9_openblas    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libopenblas               0.3.15          pthreads_h8fe5266_1    conda-forge
libpng                    1.6.37               hbc83047_0
librsvg                   2.50.7               hc3c00ef_0    conda-forge
libssh2                   1.9.0                h1ba5d50_1
libstdcxx-ng              9.3.0               hd4cf53a_17
libtiff                   4.2.0                h85742a9_0
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.0                h27cfd23_0
libxcb                    1.14                 h7b6447c_0
libxml2                   2.9.12               h72842e0_0    conda-forge
lz4-c                     1.9.3                h2531618_0
markupsafe                2.0.1            py39h27cfd23_0
matplotlib-base           3.3.4            py39h62a2d02_0
ncurses                   6.2                  he6710b0_1
networkx                  2.5.1              pyhd3eb1b0_0
ninja                     1.10.2               hff7bd54_1
numpy                     1.21.0           py39hdbf815f_0    conda-forge
olefile                   0.46                       py_0
openblas                  0.3.15          pthreads_h4748800_1    conda-forge
openjpeg                  2.3.0                h05c96fa_1
openssl                   1.1.1k               h27cfd23_0
pandas                    1.2.4            py39h2531618_0
pango                     1.48.7               hb8ff022_0    conda-forge
pcre                      8.45                 h295c915_0
pillow                    8.3.1            py39h2c7a002_0
pip                       21.1.3           py39h06a4308_0
pixman                    0.40.0               h7b6447c_0
poppunk                   2.4.0            py39h7f0572b_0    bioconda
pp-sketchlib              1.7.3            py39h85fd282_0    conda-forge
pycairo                   1.19.1           py39h708ec4a_0
pycparser                 2.20                       py_2
pygobject                 3.40.1           py39he5105b2_1    conda-forge
pyopenssl                 20.0.1             pyhd3eb1b0_1
pyparsing                 2.4.7              pyhd3eb1b0_0
pysocks                   1.7.1            py39h06a4308_0
python                    3.9.5                h12debd9_4
python-dateutil           2.8.1              pyhd3eb1b0_0
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.1             pyhd3eb1b0_0
rapidnj                   2.3.2                h7d875b9_1    bioconda
readline                  8.1                  h27cfd23_0
requests                  2.25.1             pyhd3eb1b0_0
scikit-learn              0.24.2           py39ha9443f7_0
scipy                     1.7.0            py39hee8e79c_0    conda-forge
setuptools                52.0.0           py39h06a4308_0
sigcpp-2.0                2.10.7               h9c3ff4c_0    conda-forge
six                       1.16.0             pyhd3eb1b0_0
sparsehash                2.0.2                         0    bioconda
sqlite                    3.36.0               hc218d9a_0
threadpoolctl             2.1.0              pyh5ca1d4c_0
tk                        8.6.10               hbc83047_0
tornado                   6.1              py39h27cfd23_0
tqdm                      4.61.2             pyhd3eb1b0_1
tzdata                    2021a                h52ac0ba_0
tzlocal                   2.0.0            py39h06a4308_0
urllib3                   1.26.6             pyhd3eb1b0_1
werkzeug                  1.0.1              pyhd3eb1b0_0
wheel                     0.36.2             pyhd3eb1b0_0
xorg-compositeproto       0.4.2             h7f98852_1001    conda-forge
xorg-damageproto          1.2.1             h7f98852_1002    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.6.12               h36c2ea0_0    conda-forge
xorg-libxaw               1.0.14               h7f98852_0    conda-forge
xorg-libxcomposite        0.4.5                h7f98852_0    conda-forge
xorg-libxcursor           1.2.0                h516909a_0    conda-forge
xorg-libxdamage           1.1.5                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
xorg-libxi                1.7.10               h516909a_0    conda-forge
xorg-libxinerama          1.1.4             h9c3ff4c_1001    conda-forge
xorg-libxmu               1.1.3                h516909a_0    conda-forge
xorg-libxpm               3.5.13               h516909a_0    conda-forge
xorg-libxrandr            1.5.2                h516909a_1    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxt                1.1.5             h516909a_1003    conda-forge
xorg-libxtst              1.2.3             h516909a_1002    conda-forge
xorg-randrproto           1.5.0             h7f98852_1001    conda-forge
xorg-recordproto          1.14.2            h7f98852_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-util-macros          1.19.0               h27cfd23_2
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h27cfd23_1007
xz                        5.2.5                h7b6447c_0
zipp                      3.5.0              pyhd3eb1b0_0
zlib                      1.2.11               h7b6447c_3
zstandard                 0.15.2           py39h27cfd23_0
zstd                      1.4.9                haebb681_0

Command used and output returned I'm trying to run poppunk on a set of reference genomes of Salmonella Typhimurium downloaded from ncbi. I've downloaded and extracted the salmonella enterica database and tried running poppunk_assing with the following command line: nohup poppunk_assign --db salmonella_poppunk --query teste.txt --output teste_ncbi_poppunk_clusters --threads 8 & where teste.txt has the path to the .fna files (genome_name "tab" path/to/file)

I've tried running referencing the path of the folder and in the upper level (ex: running the line --db salmonella_poppunk in "Somefolder/mydata" where both my data and the database are stored and running the line with "--db Somefolder/mydata/salmonella_poppunk" when running from another folder, both outputs are the same)

The program seems to be crashing really early on, here's the output:

PopPUNK: assign
        (with backend: sketchlib v1.7.3
         sketchlib: /home/malu/anaconda3/envs/PopPunk/lib/python3.9/site-packages/pp_sketchlib.cpython-39-x86_64-linux-gnu.so)

Graph-tools OpenMP parallelisation enabled: with 8 threads
Mode: Assigning clusters of query sequences

Traceback (most recent call last):
  File "/home/malu/anaconda3/envs/PopPunk/bin/poppunk_assign", line 11, in <module>
    sys.exit(main())
  File "/home/malu/anaconda3/envs/PopPunk/lib/python3.9/site-packages/PopPUNK/assign.py", line 519, in main
    assign_query(dbFuncs,
  File "/home/malu/anaconda3/envs/PopPunk/lib/python3.9/site-packages/PopPUNK/assign.py", line 106, in assign_query
    model = loadClusterFit(model_file + '.pkl',
  File "/home/malu/anaconda3/envs/PopPunk/lib/python3.9/site-packages/PopPUNK/models.py", line 92, in loadClusterFit
    fit_object, fit_type = pickle.load(pickle_obj)
ModuleNotFoundError: No module named 'sklearn.mixture.bayesian_mixture'

Describe the bug

It's my first time running poppunk, I'm not sure if this is my installation being messed up (I installed via conda) or the error is on my side. I think I would expect that a folder with the output to be created, however, this crashes too early for that.

johnlees commented 3 years ago

Apologies for this, I think most likely the Salmonella database you have downloaded is from version 1 of poppunk, and no longer compatible with version 2 which you have installed. You could downgrade, but to be honest I wouldn't recommend doing this partly because the original Salmonella fit isn't likely to give the resolution you'd likely desire.

We will soon release an updated Salmonella fit in the coming weeks/months compatible with the new version, and at a higher resolution. I can post here when that is available. Alternatively you could fit your own model to the Salmonella data you have downloaded.

andreaniml commented 3 years ago

I would like an update when the new database is available! Thanks for the answer. On the "fit my own model" option, is it ok to generate a model based on some data and then use the same data as query?

johnlees commented 2 years ago

Addressed in #183 and new DBs available soon