Closed dsurujon closed 3 years ago
What are your sample names? Are they all numbers? I wonder if that might be causing the second error. Could you send me your .h5 file if not and I can try and replicate.
DBSCAN doesn't always work - you can try changing the parameters as in the docs (https://poppunk.readthedocs.io/en/latest/model_fitting.html#dbscan) But another model may be better. If you post the plots of your distance distribution and GMM fit here I can probably comment on that. What species are you looking at?
The samples are from Acinetobacter baumannii, and their names are alphanumeric not just numbers, most of them are the SRA accession SRRNNNNNNN
.
Here's the distance plot with the clusters identified (I tried a few different values for K, 3 seemed to work best)
.
I'll try changing those parameters first. Also, I had to downgrade joblib
from 1.0.0 to 0.17.0. In the documentation I see the list of dependencies, and I had the more up-to-date versions of some of those packages. I wasn't able to downgrade (e.g. pp-sketch
) due to conflicts
That fit looks pretty good to me!
Would you mind posting the output of your conda list
here so I can see if there's anything obvious in terms of packages?
If you are able to share your h5 file somehow (it's anonymised, doesn't contain any sequence) I'd like to try and replicate your graph tool error
Here's the h5 file: https://drive.google.com/file/d/1AVPjbC6aFxV6YH6H6t6Sl2SMRR3yubdp/view?usp=sharing
And here's the packages list
# packages in environment at /home/defne/miniconda2/envs/poppunk_env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
boost 1.72.0 py38h1e42940_1 conda-forge
boost-cpp 1.72.0 h9359b55_3 conda-forge
brotlipy 0.7.0 py38h8df0ef7_1001 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.11.0 h470a237_1 bioconda
ca-certificates 2020.12.5 ha878542_0 conda-forge
cached-property 1.5.1 py_0 conda-forge
cairo 1.16.0 h9f066cc_1006 conda-forge
cairomm 1.12.2 2 conda-forge
cairomm-1.0 1.12.2 h0069156_2 conda-forge
certifi 2020.12.5 py38h578d9bd_0 conda-forge
cffi 1.14.4 py38ha312104_0 conda-forge
chardet 4.0.0 py38h578d9bd_0 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
cryptography 3.3.1 py38h2b97feb_0 conda-forge
cycler 0.10.0 py_2 conda-forge
decorator 4.4.2 py_0 conda-forge
dendropy 4.5.1 pyh3252c3a_0 bioconda
expat 2.2.9 he1b5a44_2 conda-forge
flask 1.1.2 pyh9f0ad1d_0 conda-forge
flask-cors 3.0.8 py_0 conda-forge
fontconfig 2.13.1 h7e3eb15_1002 conda-forge
freetype 2.10.4 h7ca028e_0 conda-forge
gettext 0.19.8.1 hf34092f_1004 conda-forge
gmp 6.2.1 h58526e2_0 conda-forge
graph-tool 2.29 py38hcba731a_1 conda-forge
h5py 3.1.0 nompi_py38hafa665b_100 conda-forge
hdbscan 0.8.26 py38h0b5ebd8_3 conda-forge
hdf5 1.10.6 nompi_h6a2412b_1113 conda-forge
icu 67.1 he1b5a44_0 conda-forge
idna 2.10 pyh9f0ad1d_0 conda-forge
itsdangerous 1.1.0 py_0 conda-forge
jinja2 2.11.2 pyh9f0ad1d_0 conda-forge
joblib 0.17.0 py_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
kiwisolver 1.3.1 py38h82cb98a_0 conda-forge
krb5 1.17.2 h926e7f8_0 conda-forge
lcms2 2.11 hcbb858e_1 conda-forge
ld_impl_linux-64 2.35.1 hea4e1c9_1 conda-forge
libblas 3.9.0 6_openblas conda-forge
libcblas 3.9.0 6_openblas conda-forge
libcurl 7.71.1 hcdd3856_8 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge
libgfortran-ng 9.3.0 he4bcb1c_17 conda-forge
libgfortran5 9.3.0 he4bcb1c_17 conda-forge
libglib 2.66.3 hbe7bbb4_0 conda-forge
libgomp 9.3.0 h5dbcf3e_17 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 6_openblas conda-forge
libnghttp2 1.41.0 hab1572f_1 conda-forge
libopenblas 0.3.12 pthreads_h4812303_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libssh2 1.9.0 hab1572f_5 conda-forge
libstdcxx-ng 9.3.0 h2ae2ef3_17 conda-forge
libtiff 4.2.0 hdc55705_0 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libwebp-base 1.1.0 h36c2ea0_3 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxml2 2.9.10 h68273f3_2 conda-forge
lz4-c 1.9.3 h9c3ff4c_0 conda-forge
markupsafe 1.1.1 py38h8df0ef7_2 conda-forge
matplotlib-base 3.3.3 py38h5c7f4ab_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
networkx 2.5 py_0 conda-forge
numpy 1.19.5 py38h18fd61f_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openblas 0.3.12 pthreads_h04b7a96_1 conda-forge
openssl 1.1.1i h7f98852_0 conda-forge
pandas 1.2.0 py38h51da96c_0 conda-forge
pcre 8.44 he1b5a44_0 conda-forge
pillow 8.1.0 py38h357d4e7_0 conda-forge
pip 20.3.3 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 h36c2ea0_0 conda-forge
poppunk 2.3.0 py_0 bioconda
pp-sketchlib 1.6.0 py38h3ac2cac_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pycairo 1.20.0 py38h323dad1_1 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pysocks 1.7.1 py38h924ce5b_2 conda-forge
python 3.8.2 he5300dc_7_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.8 1_cp38 conda-forge
pytz 2020.5 pyhd8ed1ab_0 conda-forge
rapidnj 2.3.2 hc9558a2_0 bioconda
readline 8.0 he28a2e2_2 conda-forge
requests 2.25.1 pyhd3deb0d_0 conda-forge
scikit-learn 0.24.0 py38h658cfdd_0 conda-forge
scipy 1.6.0 py38hb2138dd_0 conda-forge
setuptools 49.6.0 py38h924ce5b_2 conda-forge
sharedmem 0.3.6 py_0 bioconda
sigcpp-2.0 2.10.3 h58526e2_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sparsehash 2.0.2 0 bioconda
sqlite 3.34.0 h74cdb3f_0 conda-forge
threadpoolctl 2.1.0 pyh5ca1d4c_0 conda-forge
tk 8.6.10 h21135ba_1 conda-forge
tornado 6.1 py38h25fe258_0 conda-forge
urllib3 1.26.2 pyhd8ed1ab_0 conda-forge
werkzeug 1.0.1 pyh9f0ad1d_0 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xorg-compositeproto 0.4.2 0 conda-forge
xorg-damageproto 1.2.1 h516909a_1002 conda-forge
xorg-fixesproto 5.0 h14c3975_1002 conda-forge
xorg-inputproto 2.3.2 h14c3975_1002 conda-forge
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.12 h516909a_0 conda-forge
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxaw 1.0.13 h516909a_1002 conda-forge
xorg-libxcomposite 0.4.5 h516909a_0 conda-forge
xorg-libxcursor 1.2.0 h516909a_0 conda-forge
xorg-libxdamage 1.1.5 h516909a_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxfixes 5.0.3 h516909a_1004 conda-forge
xorg-libxi 1.7.10 h516909a_0 conda-forge
xorg-libxinerama 1.1.4 hf484d3e_1000 conda-forge
xorg-libxmu 1.1.3 h516909a_0 conda-forge
xorg-libxpm 3.5.13 h516909a_0 conda-forge
xorg-libxrandr 1.5.2 h516909a_1 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-libxt 1.1.5 h516909a_1003 conda-forge
xorg-randrproto 1.5.0 h516909a_1001 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-util-macros 1.19.2 h36c2ea0_1001 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
zstd 1.4.8 ha95c52a_1 conda-forge
I can't access the file on drive (but have sent a request for access to you)
Thinking a bit more about the fit, I think it would be worth trying fit refinement from your K = 3 fit, as that might optimise it a little further: https://poppunk.readthedocs.io/en/latest/model_fitting.html#refine
Thanks for sharing the file. Oddly this does work for me:
python ~/Documents/PopPUNK/poppunk-runner.py --fit-model bgmm --ref-db Ab_test --output Ab_test_fit --distances Ab_test/Ab_test.dists --qc-filter prune --max-a-dist 0.85 --K 3 --min-cluster-prop 0.001
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
(with backend: sketchlib v1.6.0
sketchlib: /Users/jlees/miniconda3/envs/pp-py38/lib/python3.8/site-packages/pp_sketchlib.cpython-38-darwin.so)
Graph-tools OpenMP parallelisation enabled: with 1 threads
Mode: Fitting bgmm model to reference database
Fit summary:
Avg. entropy of assignment 0.0017
Number of components used 3
Scaled component means:
[0.26120475 0.42521224]
[0.75959672 0.76000775]
[0.02938571 0.18756266]
Network summary:
Components 84
Density 0.1885
Transitivity 0.9997
Score 0.8113
Removing 1086 sequences
Done
I am using graph-tool 2.35, whereas you have 2.29. Maybe you could try upgrading with conda install graph-tool>=2.35
as that is where the error appears to be coming from?
That did the trick! Thank you very much for the quick response, I really appreciate it!
Versions poppunk 2.3.0.
poppunk_sketch 1.6.0.
Command used and output returned I'm working with ~1200 bacterial genomes, and have been trying multiple parameters for the model fitting. When I use
dbscan
it fails to find distinct clusters. I have also triedbgmm
and there I can get clusters, but have a different error (below). I've pruned the samples that didn't pass QC during DB creation, So I'm not sure if this has to do with my samples or something else.Describe the bug