bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

OSError when running poppunk_visualise with cytoscape even in emulated osx-64 #210

Closed acpaulo closed 2 years ago

acpaulo commented 2 years ago

Versions poppunk 2.4.0 pp-sketchlib v2.0.0 Python 3.8.13 MacOS Monterey 12.4

Command used and output returned poppunk_visualise OSError: error reading from file '':ios_base::clear: unspecified iostream_category error

Describe the bug I was running poppunk_visualise with cytoscape (first without updating the DB) poppunk_visualise --ref-db GPS_v4_references --query-db poppunk_clusters_2018 --output visualization_cytoscape__2018 --cytoscape

but the algorithm stoped with the following error OSError: error reading from file '':ios_base::clear: unspecified iostream_category error

I don’t know if this could be a problem with my system or is something with the code. I already run with flag —microreact and with no error.

Then I update by DB and run poppunk_assign --db GPS_v4_references --query query_2018.txt \ --output poppunk_clusters_2018 --qc-filter continue --threads 8 --update-db

And re-run microreact poppunk_visualise --ref-db poppunk_clusters_2018 --output visualization_microreact_2018 --microreact

Again it run. But if using the flag cytoscape it, again, gave the error OSError: error reading from file '':ios_base::clear: unspecified iostream_category error

Is it possible to help me please? Thank you

johnlees commented 2 years ago

Did you install via conda? Could you please provide the result of conda list if so?

Also, are you on an M1/arm64 Mac? Or is it Intel/x86_64?

acpaulo commented 2 years ago

Hi John,

I did install via conda. I'm on an M1/arm64 Mac

conda list apscheduler 3.9.1 py38h50d1736_0 conda-forge atk-1.0 2.36.0 he69c4ee_4 conda-forge boost 1.74.0 py38hb0f0857_5 conda-forge boost-cpp 1.74.0 h8b082ac_8 conda-forge brotli 1.0.9 h5eb16cf_7 conda-forge brotli-bin 1.0.9 h5eb16cf_7 conda-forge brotlipy 0.7.0 py38h9ed2024_1003
bzip2 1.0.8 h0d85af4_4 conda-forge c-ares 1.18.1 h0d85af4_0 conda-forge ca-certificates 2022.6.15 h033912b_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cairo 1.16.0 h1680b09_1011 conda-forge cairomm 1.12.2 h694c41f_4 conda-forge cairomm-1.0 1.12.2 h941ccef_4 conda-forge certifi 2022.6.15 py38h50d1736_0 conda-forge cffi 1.15.0 py38hc55c11b_1
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.3 py38h50d1736_0 conda-forge colorama 0.4.4 pyhd3eb1b0_0
conda 4.13.0 py38h50d1736_1 conda-forge conda-content-trust 0.1.1 pyhd3eb1b0_0
conda-package-handling 1.8.1 py38hca72f7f_0
cryptography 36.0.0 py38hf6deb26_0
cycler 0.11.0 pyhd8ed1ab_0 conda-forge dendropy 4.5.2 pyh3252c3a_0 bioconda docopt 0.6.2 py_1 conda-forge epoxy 1.5.10 h5eb16cf_1 conda-forge expat 2.4.8 h96cf925_0 conda-forge flask 2.1.2 pyhd8ed1ab_1 conda-forge flask-apscheduler 1.12.3 pyhd8ed1ab_1 conda-forge flask-cors 3.0.10 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.14.0 h676cef8_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.34.4 py38h0dd4459_0 conda-forge freetype 2.10.4 h4cff582_1 conda-forge fribidi 1.0.10 hbcb3906_0 conda-forge gdk-pixbuf 2.42.8 hb161b9c_0 conda-forge gettext 0.19.8.1 hd1a6beb_1008 conda-forge gfortran_impl_osx-64 9.3.0 h9cc0e5e_23 conda-forge giflib 5.2.1 hbcb3906_2 conda-forge glib-tools 2.72.1 h2292cb8_0 conda-forge gmp 6.2.1 h2e338ed_0 conda-forge graph-tool 2.45 py38hc8485ad_2 conda-forge graph-tool-base 2.45 py38h1a327f5_2 conda-forge graphite2 1.3.13 h2e338ed_1001 conda-forge gtk3 3.24.34 h99c4447_0 conda-forge gunicorn 20.1.0 py38h50d1736_2 conda-forge h5py 3.7.0 nompi_py38ha2d691f_100 conda-forge harfbuzz 4.4.1 h00bb2c2_0 conda-forge hdbscan 0.8.28 py38hbe852b5_1 conda-forge hdf5 1.12.1 nompi_h0aa1fa2_104 conda-forge hicolor-icon-theme 0.17 h694c41f_2 conda-forge icu 70.1 h96cf925_0 conda-forge idna 3.3 pyhd3eb1b0_0
importlib-metadata 4.11.4 py38h50d1736_0 conda-forge isl 0.22.1 hb1e8313_2 conda-forge itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge jinja2 3.1.2 pyhd8ed1ab_1 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9e hac89ed1_2 conda-forge kiwisolver 1.4.3 py38hf58141a_0 conda-forge krb5 1.19.3 hb98e516_0 conda-forge lcms2 2.12 h577c468_0 conda-forge lerc 3.0 he49afe7_0 conda-forge libblas 3.9.0 15_osx64_openblas conda-forge libbrotlicommon 1.0.9 h5eb16cf_7 conda-forge libbrotlidec 1.0.9 h5eb16cf_7 conda-forge libbrotlienc 1.0.9 h5eb16cf_7 conda-forge libcblas 3.9.0 15_osx64_openblas conda-forge libcurl 7.83.1 h23f1065_0 conda-forge libcxx 14.0.6 hce7ea42_0 conda-forge libdeflate 1.12 hac89ed1_0 conda-forge libedit 3.1.20191231 h0678c8f_2 conda-forge libev 4.33 haf1e3a3_1 conda-forge libffi 3.4.2 h0d85af4_5 conda-forge libgfortran 5.0.0 9_3_0_h6c81a4c_23 conda-forge libgfortran-devel_osx-64 9.3.0 h6c81a4c_23 conda-forge libgfortran5 9.3.0 h6c81a4c_23 conda-forge libgirepository 1.72.0 h0bde3a9_1 conda-forge libglib 2.72.1 hfbcb929_0 conda-forge libiconv 1.16 haf1e3a3_0 conda-forge liblapack 3.9.0 15_osx64_openblas conda-forge libnghttp2 1.47.0 hca56917_0 conda-forge libopenblas 0.3.20 openmp_hb3cd9ec_0 conda-forge libpng 1.6.37 h5a3d3bf_3 conda-forge librsvg 2.54.4 h3d48ba6_0 conda-forge libssh2 1.10.0 hd3787cc_2 conda-forge libtiff 4.4.0 h9847915_1 conda-forge libwebp 1.2.2 h28dabe5_0 conda-forge libwebp-base 1.2.2 h0d85af4_1 conda-forge libxcb 1.13 h0d85af4_1004 conda-forge libxml2 2.9.14 h08a9926_3 conda-forge libzlib 1.2.12 hfe4f2af_1 conda-forge llvm-openmp 14.0.4 ha654fa7_0 conda-forge lz4-c 1.9.3 he49afe7_1 conda-forge markupsafe 2.1.1 py38hed1de0f_1 conda-forge matplotlib-base 3.5.2 py38h1b6b9d1_0 conda-forge mpc 1.2.1 hbb51d92_0 conda-forge mpfr 4.1.0 h0f52abe_1 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge ncurses 6.3 hca72f7f_2
networkx 2.8.4 pyhd8ed1ab_0 conda-forge numpy 1.23.1 py38h604f2a5_0 conda-forge

johnlees commented 2 years ago

Can I also check, did you install in an intel conda environment (as here: https://conda-forge.org/docs/user/tipsandtricks.html#installing-apple-intel-packages-on-apple-silicon)

If not, could you give that a go?

acpaulo commented 2 years ago

Hi John,

I'm in trouble.. I run the commands you told me. But I run in my_environment = miniconda3. I don't know if that was the problem. python -c "import platform;print(platform.machine())" x86_64 echo "CONDA_SUBDIR: $CONDA_SUBDIR" CONDA_SUBDIR

At the same time I cannot run poppubk_assign because it says it does not recognize FileNotFoundError: [Errno 2] No such file or directory: 'GPS_v6/GPS_v6_fit.pkl'

The dir GPS6_V6 is in the directory i'm running poppunk. Not sure what is happening because before I could run poppunk_assign. Any ideas? Thank you

acpaulo commented 2 years ago

HI John,

Sorry I was run in the wrong PATH. Everything goes well except cytoscape. And in fact if I call echo "CONDA_SUBDIR: $CONDA_SUBDIR" it will output just CONDA_SUBDIR

acpaulo commented 2 years ago

Another issue is that we wan't to write a paper and my boss was very confused with the GPSC and poppubk cluster which she says, by reading papers, are treated in the same way. The true is that the numbering is not the same. So on your experience what should we use? GPSC from pathogen.Watch, PopPUNK or both? Than you and I'm really sorry sorry for all the issues.

acpaulo commented 2 years ago

I will leave all the commands. I install miniconda3. in /op/miniconda3/ then I run CONDA_SUBDIR=osx-64 conda create -n miniconda3 python
conda activate miniconda3 ´python -c "import platform;print(platform.machine())" x86_64 echo "CONDA_SUBDIR: $CONDA_SUBDIR" ` CONDA_SUBDIR:

acpaulo commented 2 years ago

I finally solve the problem ´python -c "import platform;print(platform.machine())" x86_64 "CONDA_SUBDIR: $CONDA_SUBDIR" ` zsh: command not found: CONDA_SUBDIR: osx-64

If I run poppunk_assign it gives /bin/sh: poppunk_sketch: command not found Traceback (most recent call last): File "/opt/miniconda3/envs/x86/bin/poppunk_assign", line 10, in sys.exit(main()) File "/opt/miniconda3/envs/x86/lib/python3.10/site-packages/PopPUNK/assign.py", line 389, in main dbFuncs = setupDBFuncs(args, args.min_kmer_count, qc_dict) File "/opt/miniconda3/envs/x86/lib/python3.10/site-packages/PopPUNK/utils.py", line 58, in setupDBFuncs version = checkSketchlibVersion() File "/opt/miniconda3/envs/x86/lib/python3.10/site-packages/PopPUNK/sketchlib.py", line 49, in checkSketchlibVersion version = line.rstrip().decode().split(" ")[1] IndexError: list index out of range

johnlees commented 2 years ago

Another issue is that we wan't to write a paper and my boss was very confused with the GPSC and poppubk cluster which she says, by reading papers, are treated in the same way. The true is that the numbering is not the same. So on your experience what should we use? GPSC from pathogen.Watch, PopPUNK or both? Than you and I'm really sorry sorry for all the issues.

For pneumo, I'd suggest using the GPSC, as these are consistently defined and used in many other publications. The clusters mostly match between the two, but for historical reasons the numbering is in a different order. You can still get GPSC from PopPUNK if you provide external clusters, see GPSC assignment here.

Everything goes well except cytoscape.

So, you are sure that you're on the intel install? That's ok then. Could you provide me with some small example input files where you get the error, and I'll try and look into it.

File "/opt/miniconda3/envs/x86/lib/python3.10/site-packages/PopPUNK/sketchlib.py", line 49, in checkSketchlibVersion version = line.rstrip().decode().split(" ")[1] IndexError: list index out of range

I am not sure from your comments above when/why this occurs. What sketchlib version do you have installed here?

acpaulo commented 2 years ago

Hi John,

I run the instructions in (as here: https://conda-forge.org/docs/user/tipsandtricks.html#installing-apple-intel-packages-on-apple-silicon)

I found that the first line should be, probably split, or else I wont have subdir osx-64.

After that I had python -c "import platform;print(platform.machine())" x86_64 echo "CONDA_SUBDIR: $CONDA_SUBDIR" CONDA_SUBDIR: osx-64

Nonetheless, after I run --cytoscape the problem continues.(with microreact, phandango etc it goes well) Here is the output

(your_environment_name) cristinapaulo@acpaulo-novo-adapt Experiment % poppunk_visualise --ref-db GPS_v4_references --query-db experiment_clusters --output experiment_cytoscape --cytoscape

(poppunk_visualise:56248): Gtk-WARNING **: 15:54:33.481: Locale not supported by C library. Using the fallback 'C' locale.

Graph-tools OpenMP parallelisation enabled: with 1 threads PopPUNK: visualise Note: Distances in experiment_clusters/experiment_clusters.dists are from assign mode Note: Distance will be extended to full all-vs-all distances Note: Re-run poppunk_assign with --update-db to avoid this Using existing random match chances in DB Calculating distances using 1 thread(s) Progress (CPU): 100.0% Loading previously refined model Completed model loading Building phylogeny Writing cytoscape output Traceback (most recent call last): File "/Users/cristinapaulo/opt/miniconda3/envs/your_environment_name/bin/poppunk_visualise", line 11, in sys.exit(main()) File "/Users/cristinapaulo/opt/miniconda3/envs/your_environment_name/lib/python3.8/site-packages/PopPUNK/visualise.py", line 452, in main generate_visualisations(args.query_db, File "/Users/cristinapaulo/opt/miniconda3/envs/your_environment_name/lib/python3.8/site-packages/PopPUNK/visualise.py", line 440, in generate_visualisations genomeNetwork = load_network_file(network_file, use_gpu = gpu_graph) File "/Users/cristinapaulo/opt/miniconda3/envs/your_environment_name/lib/python3.8/site-packages/PopPUNK/network.py", line 146, in load_network_file genomeNetwork = gt.load_graph(fn) File "/Users/cristinapaulo/opt/miniconda3/envs/your_environment_name/lib/python3.8/site-packages/graph_tool/init.py", line 3378, in load_graph g.load(file_name, fmt, ignore_vp, ignore_ep, ignore_gp) File "/Users/cristinapaulo/opt/miniconda3/envs/your_environment_name/lib/python3.8/site-packages/graph_tool/init.py", line 2924, in load props = self.__graph.read_from_file("", file_name, fmt, OSError: error reading from file '':ios_base::clear: unspecified iostream_category error

Thank you for your help.

johnlees commented 2 years ago

Ok, I will try and replicate this myself and if so see if I can fix it. Have added to the next release checklist #204

One final thing, can you list the contents of experiment_clusters and GPS_v4_references? Particularly, what .gt files do you have?

acpaulo commented 2 years ago

Ok, I will try and replicate this myself and if so see if I can fix it. Have added to the next release checklist #204

One final thing, can you list the contents of experiment_clusters and GPS_v4_references? Particularly, what .gt files do you have?

Hi John,

The contents of experiment_clusters do not have any gt file. They are experiment_clusters.dists.npy experiment_clusters.dists.pkl experiment_clusters.h5 experiment_clusters_clusters.csv experiment_clusters_external_clusters.csv experiment_clusters_qcreport.txt experiment_clusters_unword_clusters.csv

In the folder GPS_v4_references I have a gt file GPS_v4_references_graph.gt loaded from https://poppunk.net/pages/databases.html Thank you If you need some more information please let me know.

johnlees commented 2 years ago

There should be a new .gt file output from your assign query run (as long as you use --update-db). I think this is why you are getting a file not found error. Can you double-check your query run, making sure it was run with update db and that you get a .gt file in the output directory?

acpaulo commented 2 years ago

There should be a new .gt file output from your assign query run (as long as you use --update-db). I think this is why you are getting a file not found error. Can you double-check your query run, making sure it was run with update db and that you get a .gt file in the output directory?

Hi,

You're right after I use update-db I do have two gt file in my experiment_clusters,experiment_clusters_graph.gt and experiment_clustersrefs_graph.gt (Do you want me to open the content, can I do it with vim?). Nonetheless, the error is the same when I run with flag cytoscape (but it did run with microreaact and phandango). That is:

(myenv_x86) cristinapaulo@acpaulo-novo-adapt Experiment % poppunk_visualise --ref-db GPS_v4_references --query-db experiment_clusters --external-cluster experiment_clusters/experiment_clusters_external_clusters.csv --output cytoscape_experiment_clusters --cytoscape --threads 8

(poppunk_visualise:1191): Gtk-WARNING **: 12:35:39.781: Locale not supported by C library. Using the fallback 'C' locale.

Graph-tools OpenMP parallelisation enabled: with 8 threads PopPUNK: visualise Loading previously refined model Completed model loading Building phylogeny Writing cytoscape output Traceback (most recent call last): File "/Users/cristinapaulo/opt/miniconda3/envs/myenv_x86/bin/poppunk_visualise", line 11, in sys.exit(main()) File "/Users/cristinapaulo/opt/miniconda3/envs/myenv_x86/lib/python3.8/site-packages/PopPUNK/visualise.py", line 452, in main generate_visualisations(args.query_db, File "/Users/cristinapaulo/opt/miniconda3/envs/myenv_x86/lib/python3.8/site-packages/PopPUNK/visualise.py", line 440, in generate_visualisations genomeNetwork = load_network_file(network_file, use_gpu = gpu_graph) File "/Users/cristinapaulo/opt/miniconda3/envs/myenv_x86/lib/python3.8/site-packages/PopPUNK/network.py", line 146, in load_network_file genomeNetwork = gt.load_graph(fn) File "/Users/cristinapaulo/opt/miniconda3/envs/myenv_x86/lib/python3.8/site-packages/graph_tool/init.py", line 3380, in load_graph g.load(file_name, fmt, ignore_vp, ignore_ep, ignore_gp) File "/Users/cristinapaulo/opt/miniconda3/envs/myenv_x86/lib/python3.8/site-packages/graph_tool/init.py", line 2926, in load props = self.__graph.read_from_file("", file_name, fmt, OSError: error reading from file '':ios_base::clear: unspecified iostream_category error

johnlees commented 2 years ago

I apologise for not spotting this sooner, but I think this is a duplicate of https://github.com/bacpop/PopPUNK/issues/184

Can you try adding --network-file to point to your .gt file

acpaulo commented 2 years ago

I apologise for not spotting this sooner, but I think this is a duplicate of #184

Can you try adding --network-file to point to your .gt file

HI,

Yes it worked. Thank you.