bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
87 stars 17 forks source link

Microreact crashes #225

Closed andreaniml closed 1 year ago

andreaniml commented 1 year ago

Hello! I'm trying to generate a microreact visualization for a large dataset, it runs, but cashes at the end

Command used and output returned After creating a model and refining it:

poppunk_visualise --ref-db LT2_sistr_filter --previous-clustering Refine_K4_SistrQced/Refine_K4_SistrQced_clusters.csv  --model-dir Refine_K4_SistrQced --output Visu_K4QcedRef1 --tree both --microreact --external-
clustering External_clusters.tsv --threads 4

Describe the bug The prompt output:

Graph-tools OpenMP parallelisation enabled: with 4 threads
PopPUNK: visualise
Loading previously refined model
Completed model loading
Generating MST from dense distances (may be slow)
Starting calculation of minimum-spanning tree
Completed calculation of minimum-spanning tree
Drawing MST
Building phylogeny
Writing microreact output
Parsed data, now writing to CSV
Running mandrake
Running on CPU
Preprocessing 12669 samples with perplexity = 20 took 16547ms
Optimizing       Progress: 99.9%, eta=0.0010, Eq=0.9748438521, clashes=0.0%
Optimizing done in 1s
Traceback (most recent call last):
  File "/home/malu/anaconda3/envs/PopPunk_2022/bin/poppunk_visualise", line 11, in <module>
    sys.exit(main())
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/visualise.py", line 624, in main
    generate_visualisations(args.query_db,
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/visualise.py", line 561, in generate_visualisations
    url = createMicroreact(output, microreact_files, api_key)
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/plot.py", line 779, in createMicroreact
    with pkg_resources.resource_stream(__name__, 'data/microreact_example.pkl') as example_pickle:
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1160, in resource_stream
    return get_provider(package_or_requirement).get_resource_stream(
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1632, in get_resource_stream
    return open(self._fn(self.module_path, resource_name), 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/data/microreact_example.pkl'

It creates the .nwk, .csv, .png and .dot files . I think the bug happens when it tries to create the .microrreact file. The files upload normaly to microrreact.

johnlees commented 1 year ago

The problem is due to missing the reference pickle file microreact_example.pkl, as in #219

Can I ask how you installed PopPUNK? Was it through conda or pip? With conda this should be included in the install.

Anyway, to fix, see here https://github.com/bacpop/PopPUNK/issues/219#issuecomment-1248985837

andreaniml commented 1 year ago

Hi! now it outputs a different error message:

poppunk_visualise --ref-db LT2_sistr_filter --previous-clustering Refine_K4_SistrQced/Refine_K4_SistrQced_clusters.csv  --model-dir Refine_K4_SistrQced --output Visu_K4QcedRef1_fix --tree both --microreact --external-clustering External_clusters.tsv --threads 20
Graph-tools OpenMP parallelisation enabled: with 20 threads
PopPUNK: visualise
Loading previously refined model
Completed model loading
Generating MST from dense distances (may be slow)
Starting calculation of minimum-spanning tree
Completed calculation of minimum-spanning tree
Drawing MST
Building phylogeny
Writing microreact output
Parsed data, now writing to CSV
Running mandrake
Running on CPU
Preprocessing 12669 samples with perplexity = 20 took 4392ms
Optimizing       Progress: 99.9%, eta=0.0010, Eq=0.9748670693, clashes=0.2%
Optimizing done in 1s
Traceback (most recent call last):
  File "/home/malu/anaconda3/envs/PopPunk_2022/bin/poppunk_visualise", line 11, in <module>
    sys.exit(main())
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/visualise.py", line 624, in main
    generate_visualisations(args.query_db,
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/visualise.py", line 561, in generate_visualisations
    url = createMicroreact(output, microreact_files, api_key)
  File "/home/malu/anaconda3/envs/PopPunk_2022/lib/python3.10/site-packages/PopPUNK/plot.py", line 780, in createMicroreact
    json_pickle = pickle.load(example_pickle)
_pickle.UnpicklingError: invalid load key, '\x0a'.

Could it be the size? This dataset has ~12k samples, however some are really close in the distance plot and I'm getting not-so-good network scores (~0.74) even after using -refine. I was trying to see if the clusters made sense on the tree.

andreaniml commented 1 year ago

Maybe the conda environment list could be helpfull:

# packages in environment at /home/malu/anaconda3/envs/PopPunk_2022:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aom                       3.4.0                h27087fc_1    conda-forge
at-spi2-atk               2.38.0               h0630a04_3    conda-forge
at-spi2-core              2.40.3               h0630a04_0    conda-forge
atk-1.0                   2.36.0               h3371d22_4    conda-forge
biopython                 1.79            py310h5764c6d_2    conda-forge
boost                     1.74.0          py310h7c3ba0c_5    conda-forge
boost-cpp                 1.74.0               h75c5d50_8    conda-forge
brotli                    1.0.9                h166bdaf_7    conda-forge
brotli-bin                1.0.9                h166bdaf_7    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1004    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.9.14            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cairo                     1.16.0            ha61ee94_1014    conda-forge
cairomm                   1.14.3               ha770c72_0    conda-forge
cairomm-1.0               1.14.3               h924138e_0    conda-forge
certifi                   2022.9.14          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_0    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.5           py310hbf28c38_0    conda-forge
cryptography              37.0.4          py310h597c629_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
dendropy                  4.5.2              pyh3252c3a_0    bioconda
docopt                    0.6.2                      py_1    conda-forge
epoxy                     1.5.10               h166bdaf_1    conda-forge
expat                     2.4.9                h27087fc_0    conda-forge
ffmpeg                    5.1.1           gpl_hfe78399_101    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.0               hc2a2eb6_1    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.37.3          py310h5764c6d_0    conda-forge
freetype                  2.12.1               hca18f0e_0    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gdk-pixbuf                2.42.8               hff1cb4f_1    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
glib-tools                2.72.1               h6239696_0    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gnutls                    3.7.7                hf3e180e_0    conda-forge
graph-tool                2.45            py310haee70ea_2    conda-forge
graph-tool-base           2.45            py310hd8094d8_2    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gtk3                      3.24.34              h4d20fae_1    conda-forge
h5py                      3.7.0           nompi_py310h416281c_101    conda-forge
harfbuzz                  5.2.0                hf9f4e7c_0    conda-forge
hdbscan                   0.8.28          py310h96516ba_1    conda-forge
hdf5                      1.12.2          nompi_h2386368_100    conda-forge
hicolor-icon-theme        0.17                 ha770c72_2    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py310hbf28c38_0    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
lame                      3.100             h7f98852_1001    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
libbrotlidec              1.0.9                h166bdaf_7    conda-forge
libbrotlienc              1.0.9                h166bdaf_7    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcups                   2.3.3                h3e49a29_2    conda-forge
libcurl                   7.83.1               h7bff187_0    conda-forge
libdeflate                1.14                 h166bdaf_0    conda-forge
libdrm                    2.4.113              h166bdaf_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            12.1.0              h69a702a_16    conda-forge
libgfortran5              12.1.0              hdcd56e2_16    conda-forge
libgirepository           1.72.0               h26ff761_1    conda-forge
libglib                   2.72.1               h2d90d5f_0    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libidn2                   2.3.3                h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnghttp2                1.47.0               hdcd2b5c_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpciaccess              0.16                 h516909a_0    conda-forge
libpng                    1.6.38               h753d276_0    conda-forge
librsvg                   2.54.4               h7abd40a_0    conda-forge
libsqlite                 3.39.3               h753d276_0    conda-forge
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libtasn1                  4.19.0               h166bdaf_0    conda-forge
libtiff                   4.4.0                h55922b4_4    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libva                     2.15.0               h166bdaf_0    conda-forge
libvpx                    1.11.0               h9c3ff4c_3    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.9.14               h22db469_4    conda-forge
libzlib                   1.2.12               h166bdaf_3    conda-forge
mandrake                  1.2.2           py310h7dbff7e_1    conda-forge
matplotlib-base           3.6.0           py310h8d5ebf3_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nettle                    3.8.1                hc379101_1    conda-forge
networkx                  2.8.6              pyhd8ed1ab_0    conda-forge
numpy                     1.23.3          py310h53a5b5f_0    conda-forge
openblas                  0.3.21          pthreads_h320a7e8_3    conda-forge
openh264                  2.3.0                h27087fc_0    conda-forge
openjpeg                  2.5.0                h7d73246_1    conda-forge
openssl                   1.1.1q               h166bdaf_0    conda-forge
p11-kit                   0.24.1               hc5aa10d_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.0           py310h769672d_0    conda-forge
pango                     1.50.10              hc4f8a73_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    9.2.0           py310hbd86126_2    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
plotly                    5.10.0             pyhd8ed1ab_0    conda-forge
poppunk                   2.5.0           py310h2579afa_0    bioconda
pp-sketchlib              2.0.0           py310h5a37817_2    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycairo                   1.21.0          py310h96fc21a_1    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygobject                 3.42.2          py310h964465f_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.6          h582c2e5_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    2_cp310    conda-forge
pytz                      2022.2.1           pyhd8ed1ab_0    conda-forge
rapidnj                   2.3.2                h9f5acd7_2    bioconda
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
scikit-learn              1.1.2           py310h0c3af53_0    conda-forge
scipy                     1.9.1           py310hdfbd76f_0    conda-forge
setuptools                65.3.0             pyhd8ed1ab_1    conda-forge
sigcpp-2.0                2.10.8               h27087fc_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sparsehash                2.0.4                h9c3ff4c_0    conda-forge
svt-av1                   1.2.1                h27087fc_0    conda-forge
tenacity                  8.1.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
treeswift                 1.1.28             pyh5e36f6f_0    bioconda
tzdata                    2022c                h191b570_0    conda-forge
unicodedata2              14.0.0          py310h5764c6d_1    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
x264                      1!164.3095           h166bdaf_2    conda-forge
x265                      3.5                  h924138e_3    conda-forge
xorg-compositeproto       0.4.2             h7f98852_1001    conda-forge
xorg-damageproto          1.2.1             h7f98852_1002    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.6.12               h36c2ea0_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxaw               1.0.14               h7f98852_0    conda-forge
xorg-libxcomposite        0.4.5                h7f98852_0    conda-forge
xorg-libxcursor           1.2.0                h516909a_0    conda-forge
xorg-libxdamage           1.1.5                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
xorg-libxi                1.7.10               h516909a_0    conda-forge
xorg-libxinerama          1.1.4             h9c3ff4c_1001    conda-forge
xorg-libxmu               1.1.3                h516909a_0    conda-forge
xorg-libxpm               3.5.13               h516909a_0    conda-forge
xorg-libxrandr            1.5.2                h516909a_1    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxt                1.1.5             h516909a_1003    conda-forge
xorg-libxtst              1.2.3             h516909a_1002    conda-forge
xorg-randrproto           1.5.0             h7f98852_1001    conda-forge
xorg-recordproto          1.14.2            h7f98852_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-util-macros          1.19.3               h7f98852_0    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.12               h166bdaf_3    conda-forge
zstandard                 0.18.0          py310h5764c6d_0    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge
johnlees commented 1 year ago

This just looks like an issue with loading the file, not specifically to do with your data.

I am wondering whether the pickle you have downloaded was correct. Can you try downloading the raw file from here then running python and then from the python command line:

import pickle
import json
file = open('microreact_example.pkl', 'rb')
data = pickle.load(file)
print(data)
andreaniml commented 1 year ago

Pickle seems to be the problem: image

What should I do?

johnlees commented 1 year ago

I can't replicate this issue, the pickle loads ok for me. Can I first check that the file was downloaded correctly by making sure we have the same checksum (by running the first openssl command below):

openssl sha256 microreact_example.pkl
SHA256(microreact_example.pkl)= 46aad24bb501f90ba9140e4a76e2479004427106cfe7a02248593cb4085dc59e
andreaniml commented 1 year ago

My file had a different sha256, I re-downloaded it from github (previously I used wget) and now it seems to work. My gess is that the one I downloaded from the other issue had a similar problem. I'm going to test it now and see if the problem is solved. image Many Thanks!

andreaniml commented 1 year ago

Solved! Thank you very much