bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

Help with issues running poppunk in HPC environment #193

Open CarmenSheppard opened 2 years ago

CarmenSheppard commented 2 years ago

@nickjcroucher asked me to raise a formal issue when I mentioned to him

*Versions**

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge apscheduler 3.8.1 py39hf3d152e_0 conda-forge at-spi2-atk 2.38.0 h0630a04_3 conda-forge at-spi2-core 2.40.3 h0630a04_0 conda-forge atk-1.0 2.36.0 h3371d22_4 conda-forge boost 1.74.0 py39h5472131_3 conda-forge boost-cpp 1.74.0 h312852a_4 conda-forge brotlipy 0.7.0 py39h3811e60_1001 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge cairomm 1.12.2 ha770c72_3 conda-forge cairomm-1.0 1.12.2 h56b4340_3 conda-forge certifi 2021.10.8 py39hf3d152e_1 conda-forge cffi 1.14.6 py39h4bc2ebd_1 conda-forge chardet 4.0.0 py39hf3d152e_1 conda-forge charset-normalizer 2.0.0 pyhd8ed1ab_0 conda-forge click 8.0.3 py39hf3d152e_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge cryptography 35.0.0 py39h95dcef6_1 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge dataclasses 0.8 pyhc8e2a94_3 conda-forge dbus 1.13.6 h48d8840_2 conda-forge dendropy 4.5.2 pyh3252c3a_0 bioconda epoxy 1.5.9 h7f98852_0 conda-forge expat 2.4.1 h9c3ff4c_0 conda-forge flask 2.0.2 pyhd8ed1ab_0 conda-forge flask-apscheduler 1.12.2 pyhd8ed1ab_1 conda-forge flask-cors 3.0.10 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.10.4 h0708190_1 conda-forge fribidi 1.0.10 h36c2ea0_0 conda-forge gdk-pixbuf 2.42.6 h04a7f16_0 conda-forge gettext 0.19.8.1 h73d1719_1008 conda-forge glib 2.70.0 h780b84a_1 conda-forge glib-tools 2.70.0 h780b84a_1 conda-forge gmp 6.2.1 h58526e2_0 conda-forge graph-tool 2.43 py39hc4320a7_0 conda-forge graph-tool-base 2.43 py39h8160539_0 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge gtk3 3.24.29 h8c9bf5d_3 conda-forge gunicorn 20.1.0 py39hf3d152e_0 conda-forge h5py 3.2.1 nompi_py39h98ba4bc_100 conda-forge harfbuzz 3.0.0 h83ec7ef_1 conda-forge hdbscan 0.8.27 py39hce5d2b2_0 conda-forge hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge hicolor-icon-theme 0.17 ha770c72_2 conda-forge icu 68.2 h9c3ff4c_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge itsdangerous 2.0.1 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jinja2 3.0.2 pyhd8ed1ab_0 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge kiwisolver 1.3.2 py39h1a9c180_0 conda-forge krb5 1.19.2 hcc1bbae_2 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libblas 3.9.0 12_linux64_openblas conda-forge libcblas 3.9.0 12_linux64_openblas conda-forge libcups 2.3.3 hf5a7f15_0 conda-forge libcurl 7.79.1 h2574ce0_1 conda-forge libdeflate 1.8 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.4.2 h9c3ff4c_4 conda-forge libgcc-ng 11.2.0 h1d223b6_11 conda-forge libgfortran-ng 11.2.0 h69a702a_11 conda-forge libgfortran5 11.2.0 h5c6108e_11 conda-forge libgirepository 1.70.0 hb520f89_0 conda-forge libglib 2.70.0 h174f98d_1 conda-forge libgomp 11.2.0 h1d223b6_11 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 12_linux64_openblas conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge librsvg 2.52.3 hc3c00ef_0 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge libtiff 4.3.0 h6f004c6_2 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxcb 1.13 h7f98852_1003 conda-forge libxml2 2.9.12 h72842e0_0 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge markupsafe 2.0.1 py39h3811e60_0 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge ncurses 6.2 h58526e2_4 conda-forge networkx 2.6.3 pyhd8ed1ab_1 conda-forge numpy 1.21.3 py39hdbf815f_0 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openblas 0.3.18 pthreads_h4748800_0 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1l h7f98852_0 conda-forge pandas 1.3.4 py39hde0f152_0 conda-forge pango 1.48.10 h54213e6_2 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pillow 8.3.2 py39ha612740_0 conda-forge pip 21.3.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge poppunk 2.4.0 py39h7f0572b_0 bioconda pp-sketchlib 1.7.4 py39hdefe18a_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge pycairo 1.20.1 py39hedcb9fc_0 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pygobject 3.42.0 py39ha6f447c_0 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.4 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 py39hf3d152e_3 conda-forge python 3.9.7 hb7a2778_3_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-tzdata 2021.5 pyhd8ed1ab_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pytz-deprecation-shim 0.1.0.post0 py39hf3d152e_0 conda-forge rapidnj 2.3.2 h7d875b9_1 bioconda readline 8.1 h46c0cb4_0 conda-forge requests 2.26.0 pyhd8ed1ab_0 conda-forge scikit-learn 1.0.1 py39h7c5d8c9_1 conda-forge scipy 1.7.1 py39hee8e79c_0 conda-forge setuptools 58.4.0 py39hf3d152e_1 conda-forge sigcpp-2.0 2.10.7 h9c3ff4c_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sparsehash 2.0.4 h9c3ff4c_0 conda-forge sqlite 3.36.0 h9cd32fc_2 conda-forge threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge tk 8.6.11 h27826a3_1 conda-forge tornado 6.1 py39h3811e60_1 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge tzdata 2021e he74cb21_0 conda-forge tzlocal 2.1 pyh9f0ad1d_0 conda-forge urllib3 1.26.7 pyhd8ed1ab_0 conda-forge werkzeug 2.0.1 pyhd8ed1ab_0 conda-forge wheel 0.37.0 pyhd8ed1ab_1 conda-forge xorg-compositeproto 0.4.2 h7f98852_1001 conda-forge xorg-damageproto 1.2.1 h7f98852_1002 conda-forge xorg-fixesproto 5.0 h7f98852_1002 conda-forge xorg-inputproto 2.3.2 h7f98852_1002 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.6.12 h36c2ea0_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxaw 1.0.14 h7f98852_0 conda-forge xorg-libxcomposite 0.4.5 h7f98852_0 conda-forge xorg-libxcursor 1.2.0 h516909a_0 conda-forge xorg-libxdamage 1.1.5 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h516909a_0 conda-forge xorg-libxfixes 5.0.3 h516909a_1004 conda-forge xorg-libxi 1.7.10 h516909a_0 conda-forge xorg-libxinerama 1.1.4 h9c3ff4c_1001 conda-forge xorg-libxmu 1.1.3 h516909a_0 conda-forge xorg-libxpm 3.5.13 h516909a_0 conda-forge xorg-libxrandr 1.5.2 h516909a_1 conda-forge xorg-libxrender 0.9.10 h516909a_1002 conda-forge xorg-libxt 1.1.5 h516909a_1003 conda-forge xorg-libxtst 1.2.3 h516909a_1002 conda-forge xorg-randrproto 1.5.0 h7f98852_1001 conda-forge xorg-recordproto 1.14.2 h7f98852_1002 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-util-macros 1.19.3 h7f98852_0 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstandard 0.16.0 py39h3811e60_0 conda-forge zstd 1.5.0 ha95c52a_0 conda-forge

Command used and output returned

poppunk --create-db --output poppunk/poppunk_db --r-files files.txt --threads 1 Segmentation fault (core dumped)

Describe the bug

I cannot run poppunk --create-db on the hpc environment. after a few seconds it reports a segmentation fault (core dumped) and does not give any other output. Other commands can successfully run. I have deleted and recreated the poppunk environment on the cluster and tried starting afresh. I can run poppunk fine on my laptop. I have checked versions of the hpc software against the same software on my from conda env list and then manually installed any versions that were different but this didnt' change the error, now everything that is on my laptop poppunk env is the same as the hpc poppunk env..

The hpc is running an older version of conda which I cannot update 4.9.2 (laptop 4.10.3) I'm wondering if there's something the --create-db calls that causes the issue on the hpc?

johnlees commented 2 years ago

This is always a tricky one. Potentially we've got a memory related bug, these sometimes cause a crash and sometimes don't. First, it would be helpful to get a minimal example.

  1. Can you try running with two files and then ten files in files.txt and see if you get the same issue?
  2. Does running poppunk_sketch to a) create the sketch database and b) create the distance matrix work? Both with the full set, and the smaller set of ten.

Depending on those results, I will hopefully be able to rerun with the same settings, and look at a memory debugger.

CarmenSheppard commented 2 years ago

Hi John,

Yes same issue with --create-db running 10 or 2 files.

However --sketch seemed to work fine with 3000ish files (I stopped it manually as I was limited to 1 thread and it was going to take a while!).

also worked fine to completion with 10 files. as did --query DB with 10 files. I think it would have been fine with the larger dataset too but I am currently unable to get more than 1 thread node due to congestion on the hpc!

So it seems specific to the --create-db command.

Carmen


From: John Lees @.> Sent: 10 November 2021 12:09 To: johnlees/PopPUNK @.> Cc: Carmen Sheppard @.>; Author @.> Subject: Re: [johnlees/PopPUNK] Help with issues running poppunk in HPC environment (Issue #193)

This is always a tricky one. Potentially we've got a memory related bug, these sometimes cause a crash and sometimes don't. First, it would be helpful to get a minimal example.

  1. Can you try running with two files and then ten files in files.txt and see if you get the same issue?
  2. Does running poppunk_sketch to a) create the sketch database and b) create the distance matrix work? Both with the full set, and the smaller set of ten.

Depending on those results, I will hopefully be able to rerun with the same settings, and look at a memory debugger.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjohnlees%2FPopPUNK%2Fissues%2F193%23issuecomment-965070477&data=04%7C01%7CCarmen.Sheppard%40phe.gov.uk%7C30666a7159e34a82fab408d9a442f0b3%7Cee4e14994a354b2ead475f3cf9de8666%7C0%7C0%7C637721429680553396%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GqanQZ42%2FKVitD35kR8eF1%2BpIpOdw9qm1i67JTKM2io%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIBC5SGBPVC2WJQQRDT6SLLULJOHLANCNFSM5HXWNITA&data=04%7C01%7CCarmen.Sheppard%40phe.gov.uk%7C30666a7159e34a82fab408d9a442f0b3%7Cee4e14994a354b2ead475f3cf9de8666%7C0%7C0%7C637721429680553396%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iARnGuTVInrNmSPaMf0m4fhjkIRWe8PcFpKdRUxUyEA%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7CCarmen.Sheppard%40phe.gov.uk%7C30666a7159e34a82fab408d9a442f0b3%7Cee4e14994a354b2ead475f3cf9de8666%7C0%7C0%7C637721429680563344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mqyPcxQ%2B5%2FWFAxutSPA2or1bPd0ZJFyhSEIo2KS%2BZTs%3D&reserved=0 or Androidhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7CCarmen.Sheppard%40phe.gov.uk%7C30666a7159e34a82fab408d9a442f0b3%7Cee4e14994a354b2ead475f3cf9de8666%7C0%7C0%7C637721429680573301%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YbRY6UlZOPxtl0gCy0BEkCwKRBXVmQcxaYlltlIrQTg%3D&reserved=0.


The information contained in the Email and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the UKHSA, or the intended recipient or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this Email has been swept for computer viruses by Exchange Online Protection, but please re-sweep any attachments before opening or saving. UK Health Security Agency -(https://www.gov.uk/government/organisations/uk-health-security-agency)


johnlees commented 2 years ago

Ok that's useful. Would it be possible to share those two files? If not, I'll see if I can reproduce in a debugger with two test files of my own.

CarmenSheppard commented 2 years ago

Ok I'll email you the assemblies directly. It doesnt' seem to matter which files I try though I've attempted to run different datasets with the same issues.

johnlees commented 2 years ago

Hi Carmen,

I've tried to reproduce this bug, but I am unable to. I've also run through with valgrind (which lets you check for segfaults) and that doesn't turn anything up during the call to sketching made by --create-db.

You should at least see:

PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
        (with backend: sketchlib v1.7.4
         sketchlib: /home/jlees/miniconda3/envs/pp-clean/lib/python3.9/site-packages/pp_sketchlib.cpython-39-x86_64-linux-gnu.so)

or similar when the code runs. I think any error before that is due to one of the dependencies.

I have a vague memory of there being some possible issues with the graph-tool package. Could you try: 1) Cloning this repository 2) Within that clone, run pushd PopPUNK && for i in $(ls *.py); do sed -i -e 's/import graph_tool/#import graph_tool/' $i; done && popd which will comment out all the graph_tool imports 3) Run your command from before, but using python poppunk-runner.py in place of poppunk

If that prints some output, but then eventually fails on graph_tool being missing, then that's the culprit.

CarmenSheppard commented 2 years ago

Hi John,.

Yes looks like the graph tool is the problem :

PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
(with backend: sketchlib v1.7.4
sketchlib: /home/carmen.sheppard/.conda/envs/poppunk/lib/python3.9/site-packages/pp_sketchlib.cpython-39-x86_64-linux-gnu.so)
Traceback (most recent call last):
  File "/home/carmen.sheppard/software/PopPUNK/poppunk-runner.py", line 10, in <module>
    main()
  File "/hpscol02/tenant2/hpc_storage/home/carmen.sheppard/software/PopPUNK/PopPUNK/__main__.py", line 293, in main
    setGtThreads(args.threads)
  File "/hpscol02/tenant2/hpc_storage/home/carmen.sheppard/software/PopPUNK/PopPUNK/utils.py", line 37, in setGtThreads
    if gt.openmp_enabled():
NameError: name 'gt' is not defined

Is there a previous fix for this issue?

johnlees commented 2 years ago

Ok, glad that we seem to have nailed it down. Unfortunately I don't write or support graph tool so the help I can provide is more limited, but some suggestions I have:

1) Raise an issue on the graph tool gitlab: https://git.skewed.de/count0/graph-tool/-/issues I would recommend, if you can, running the failing command under valgrind to help them understand. For example:

valgrind --leak-check=full --track-origins=yes --error-limit=no poppunk --create-db --output poppunk/poppunk_db --r-files rfile.txt --threads 1 2> pp_valgrind.txt

and then uploading pp_valgrind.txt with the issue report and your conda environment list. (you'll of course want to do this with the code where gt was still being imported)

2) Try installing a different/older version of graph-tool from conda. You can specify the version for example with conda install graph-tool==2.4.2. If this works, it would be helpful to post an issue on the conda recipe noting the issue.

3) Follow the instructions to build from source: https://git.skewed.de/count0/graph-tool/-/wikis/installation-instructions#manual-compilation Again, if this works it would be helpful to post an issue on the conda recipe page

CarmenSheppard commented 2 years ago

Ok I tried versions of graph-tool in conda back to 2.35 with the same fault. The earliest version in conda-forge 2.29 then caused other incompatibilities so I didn't try that.

We'll think about the other suggestions you made. Thanks very much for your help!