bacpop / ggCaller

Bifrost graph gene caller.
MIT License
86 stars 6 forks source link

error handeling : terminate called after throwing an instance of 'std::out_of_range' #15

Closed aababc1 closed 7 months ago

aababc1 commented 9 months ago

Hi thank you for your nice tool. I've encountered problem while using ggcaller v1.3.4. (error log is below the description part ) I created conda environment and installed ggcaller following official instruction . Test run with small number of genome (10) was very successful.

Hardware memory is 2TB. I guess it would not be caused by memory shortage. OS is centos 7.9 also all input files are raw genomic fasta files that is without Ns (1380 files).

Do I have to set any specific system variable? Could you provide any help for this situation ? Thank you .

time ggcaller --refs 1380list --out Bacteroids_uniformis --clean-mode moderate --alignment core --core-thresh old 0.9 --threads 80
Building coloured compacted DBG... Generating graph stop codon index... Mapping contigs to graph... Loading gene models... Traversing graph to identify ORFs... |██████████████████████████████████████████████████| 100% Generating clusters of high-scoring ORFs... Scoring ORF clusters... |████████████ | 25%terminate called after throwing an instance of 'std::out_of_range' what(): key not found Aborted

real 346m12.060s user 7820m31.364s sys 178m3.927s

created files till error are two . 572M 2023-11-24 07:25 1380list.color.bfg 440M 2023-11-24 07:25 1380list.gfa

here is the conda env list.

miniconda3/envs/ggcaller134: #

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge alsa-lib 1.2.10 hd590300_0 conda-forge argcomplete 3.1.6 pyhd8ed1ab_0 conda-forge argh 0.30.4 pyhd8ed1ab_0 conda-forge attr 2.5.1 h166bdaf_1 conda-forge bcbio-gff 0.7.0 pyh7cba7a3_0 bioconda bifrost 1.3.1 h43eeafb_0 bioconda binutils 2.40 hdd6e379_0 conda-forge binutils_impl_linux-64 2.40 hf600244_0 conda-forge binutils_linux-64 2.40 hbdbef99_2 conda-forge biopython 1.80 py39hb9d737c_0 conda-forge blast 2.15.0 pl5321h6f7f691_1 bioconda boost-cpp 1.82.0 h44aadfe_6 conda-forge brotli 1.1.0 hd590300_1 conda-forge brotli-bin 1.1.0 hd590300_1 conda-forge bx-python 0.10.0 py39h31164c1_0 bioconda bzip2 1.0.8 hd590300_5 conda-forge c-ares 1.22.1 hd590300_0 conda-forge c-compiler 1.6.0 hd590300_0 conda-forge ca-certificates 2023.11.17 hbcca054_0 conda-forge cairo 1.18.0 h3faef2a_0 conda-forge cd-hit 4.8.1 h43eeafb_9 bioconda certifi 2023.11.17 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py39h7a31438_0 conda-forge cmake 3.27.8 hcfe8598_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge contourpy 1.2.0 py39h7633fee_0 conda-forge curl 8.4.0 hca28451_0 conda-forge cxx-compiler 1.6.0 h00ab1b0_0 conda-forge cycler 0.12.1 pyhd8ed1ab_0 conda-forge dbus 1.13.6 h5008d03_3 conda-forge diamond 2.1.8 h43eeafb_0 bioconda eigen 3.3.9 h4bd325d_1 conda-forge entrez-direct 16.2 he881be0_1 bioconda expat 2.5.0 hcb278e6_1 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.45.1 py39hd1e30aa_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge future 0.18.3 pyhd8ed1ab_0 conda-forge gawk 5.3.0 ha916aea_0 conda-forge gcc 12.3.0 h8d2909c_2 conda-forge gcc_impl_linux-64 12.3.0 he2b93b0_3 conda-forge gcc_linux-64 12.3.0 h76fc315_2 conda-forge gettext 0.21.1 h27087fc_0 conda-forge gffutils 0.12 pyh7cba7a3_0 bioconda ggcaller 1.3.4 pypi_0 pypi glib 2.78.1 hfc55251_1 conda-forge glib-tools 2.78.1 hfc55251_1 conda-forge gmp 6.3.0 h59595ed_0 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge gst-plugins-base 1.22.7 h8e1006c_0 conda-forge gstreamer 1.22.7 h98fc4e7_0 conda-forge gxx 12.3.0 h8d2909c_2 conda-forge gxx_impl_linux-64 12.3.0 he2b93b0_3 conda-forge gxx_linux-64 12.3.0 h8a814eb_2 conda-forge harfbuzz 8.3.0 h3d44ed6_0 conda-forge hmmer 3.4 hdbdd923_0 bioconda icu 73.2 h59595ed_0 conda-forge importlib-metadata 6.8.0 pyha770c72_0 conda-forge importlib-resources 6.1.1 pyhd8ed1ab_0 conda-forge importlib_resources 6.1.1 pyhd8ed1ab_0 conda-forge intbitset 3.0.2 py39hd1e30aa_1 conda-forge joblib 1.3.2 pyhd8ed1ab_0 conda-forge kernel-headers_linux-64 2.6.32 he073ed8_16 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.5 py39h7633fee_1 conda-forge krb5 1.21.2 h659d440_0 conda-forge lame 3.100 h166bdaf_1003 conda-forge lcms2 2.15 hb7c19ff_3 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libblas 3.9.0 16_linux64_mkl conda-forge libboost 1.82.0 h6fcfa73_6 conda-forge libboost-devel 1.82.0 h00ab1b0_6 conda-forge libboost-headers 1.82.0 ha770c72_6 conda-forge libbrotlicommon 1.1.0 hd590300_1 conda-forge libbrotlidec 1.1.0 hd590300_1 conda-forge libbrotlienc 1.1.0 hd590300_1 conda-forge libcap 2.69 h0f662aa_0 conda-forge libcblas 3.9.0 16_linux64_mkl conda-forge libclang 15.0.7 default_h7634d5b_3 conda-forge libclang13 15.0.7 default_h9986a30_3 conda-forge libcups 2.3.3 h4637d8d_4 conda-forge libcurl 8.4.0 hca28451_0 conda-forge libdeflate 1.19 hd590300_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.5.0 hcb278e6_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libflac 1.4.3 h59595ed_0 conda-forge libgcc 7.2.0 h69d50b8_2 conda-forge libgcc-devel_linux-64 12.3.0 h8bca6fd_103 conda-forge libgcc-ng 13.2.0 h807b86a_3 conda-forge libgcrypt 1.10.2 hd590300_0 conda-forge libgfortran-ng 13.2.0 h69a702a_3 conda-forge libgfortran5 13.2.0 ha4646dd_3 conda-forge libglib 2.78.1 h783c2da_1 conda-forge libgomp 13.2.0 h807b86a_3 conda-forge libgpg-error 1.47 h71f35ed_0 conda-forge libhwloc 2.9.3 default_h554bfaf_1009 conda-forge libiconv 1.17 h166bdaf_0 conda-forge libidn2 2.3.4 h166bdaf_0 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblapack 3.9.0 16_linux64_mkl conda-forge libllvm15 15.0.7 h5cf9203_3 conda-forge libnghttp2 1.58.0 h47da74e_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libogg 1.3.4 h7f98852_1 conda-forge libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpng 1.6.39 h753d276_0 conda-forge libpq 16.1 hfc447b1_0 conda-forge libprotobuf 3.19.6 h3eb15da_0 conda-forge libsanitizer 12.3.0 h0f45ef3_3 conda-forge libsndfile 1.2.2 hc60ed4a_1 conda-forge libsqlite 3.44.1 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-devel_linux-64 12.3.0 h8bca6fd_103 conda-forge libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge libsystemd0 254 h3516f8a_0 conda-forge libtiff 4.6.0 ha9c0a0a_2 conda-forge libunistring 0.9.10 h7f98852_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libuv 1.46.0 hd590300_0 conda-forge libvorbis 1.3.7 h9c3ff4c_0 conda-forge libwebp-base 1.3.2 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxkbcommon 1.6.0 h5d7e998_0 conda-forge libxml2 2.11.6 h232c23b_0 conda-forge libzlib 1.2.13 hd590300_5 conda-forge llvm-openmp 17.0.5 h4dfa4b3_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge mafft 7.520 h031d066_3 bioconda make 4.3 hd18ef5c_1 conda-forge matplotlib 3.8.2 py39hf3d152e_0 conda-forge matplotlib-base 3.8.2 py39he9076e7_0 conda-forge mkl 2022.2.1 h84fe81f_16997 conda-forge mkl-devel 2022.2.1 ha770c72_16998 conda-forge mkl-include 2022.2.1 h84fe81f_16997 conda-forge mpfr 4.2.1 h9458935_0 conda-forge mpg123 1.32.3 h59595ed_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mysql-common 8.0.33 hf1915f5_6 conda-forge mysql-libs 8.0.33 hca2cd23_6 conda-forge ncbi-vdb 3.0.8 hdbdd923_0 bioconda ncurses 6.4 h59595ed_2 conda-forge networkx 3.2.1 pyhd8ed1ab_0 conda-forge ninja 1.11.1 h924138e_0 conda-forge nspr 4.35 h27087fc_0 conda-forge nss 3.94 h1d7d5a4_0 conda-forge numpy 1.26.0 py39h474f0d3_0 conda-forge openblas 0.3.25 pthreads_h7a3da1a_0 conda-forge openjpeg 2.5.0 h488ebb8_3 conda-forge openssl 3.1.4 hd590300_0 conda-forge ossuuid 1.6.2 hf484d3e_1000 conda-forge packaging 23.2 pyhd8ed1ab_0 conda-forge pandas 2.1.3 py39hddac248_0 conda-forge patsy 0.5.3 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pcre2 10.42 hcad00b1_0 conda-forge perl 5.22.0.1 0 conda-forge perl-app-cpanminus 1.7043 pl5.22.0_0 bioconda perl-archive-tar 2.18 pl5.22.0_2 bioconda perl-carp 1.38 pl5.22.0_0 bioconda perl-common-sense 3.74 0 bioconda perl-compress-raw-bzip2 2.069 1 bioconda perl-compress-raw-zlib 2.069 3 bioconda perl-data-dumper 2.161 pl5.22.0_0 bioconda perl-exporter 5.72 pl5.22.0_0 bioconda perl-exporter-tiny 0.042 1 bioconda perl-extutils-makemaker 7.24 pl5.22.0_1 bioconda perl-io-compress 2.069 pl5.22.0_2 bioconda perl-io-zlib 1.10 1 bioconda perl-json 2.90 1 bioconda perl-json-xs 2.34 0 bioconda perl-list-moreutils 0.428 pl5.22.0_0 bioconda perl-pathtools 3.73 h470a237_2 bioconda perl-scalar-list-utils 1.45 2 bioconda perl-test-more 1.001002 pl5.22.0_0 bioconda perl-threaded 5.32.1 hdfd78af_1 bioconda perl-uri 1.71 pl5.22.0_1 bioconda perl-xml-libxml 2.0124 0 bioconda perl-xml-namespacesupport 1.11 0 bioconda perl-xml-sax 0.99 0 bioconda perl-xml-sax-base 1.08 0 bioconda pillow 10.1.0 py39had0adad_0 conda-forge pip 23.3.1 pyhd8ed1ab_0 conda-forge pixman 0.42.2 h59595ed_0 conda-forge ply 3.11 py_1 conda-forge protobuf 3.19.6 py39h227be39_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge pulseaudio-client 16.1 hb77b528_5 conda-forge pybind11 2.11.1 py39h7633fee_2 conda-forge pybind11-global 2.11.1 py39h7633fee_2 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pyfaidx 0.7.2.2 pyhdfd78af_0 bioconda pyparsing 3.1.1 pyhd8ed1ab_0 conda-forge pyqt 5.15.9 py39h52134e7_5 conda-forge pyqt5-sip 12.12.2 py39h3d6467e_5 conda-forge python 3.9.18 h0755675_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-edlib 1.3.9 py39h1f90b4d_4 bioconda python-tzdata 2023.3 pyhd8ed1ab_0 conda-forge python-wget 3.2 py_0 conda-forge python_abi 3.9 4_cp39 conda-forge pytorch 1.10.2 cpu_py39h5e9ed0b_1 conda-forge pytorch-cpu 1.10.2 cpu_py39h718b53a_1 conda-forge pytz 2023.3.post1 pyhd8ed1ab_0 conda-forge pyvcf3 1.0.3 pyhdfd78af_0 bioconda qt-main 5.15.8 h82b777d_17 conda-forge rapidnj 2.3.2 h4ac6f70_4 bioconda readline 8.2 h8228510_1 conda-forge rhash 1.4.4 hd590300_0 conda-forge scipy 1.11.3 py39h474f0d3_1 conda-forge seaborn 0.13.0 hd8ed1ab_0 conda-forge seaborn-base 0.13.0 pyhd8ed1ab_0 conda-forge setuptools 59.5.0 py39hf3d152e_0 conda-forge simplejson 3.19.2 py39hd1e30aa_0 conda-forge sip 6.7.12 py39h3d6467e_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sleef 3.5.1 h9b69904_2 conda-forge snp-sites 2.5.1 he4a0461_4 bioconda statsmodels 0.14.0 py39h44dd56e_2 conda-forge sysroot_linux-64 2.12 he073ed8_16 conda-forge tbb 2021.10.0 h00ab1b0_2 conda-forge tbb-devel 2021.10.0 h00ab1b0_2 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge toml 0.10.2 pyhd8ed1ab_0 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge tornado 6.3.3 py39hd1e30aa_1 conda-forge tqdm 4.66.1 pyhd8ed1ab_0 conda-forge typing_extensions 4.8.0 pyha770c72_0 conda-forge tzdata 2023c h71feb2d_0 conda-forge uncertainties 3.1.7 pyhd8ed1ab_0 conda-forge unicodedata2 15.1.0 py39hd1e30aa_0 conda-forge wget 1.20.3 ha35d2d1_1 conda-forge wheel 0.41.3 pyhd8ed1ab_0 conda-forge xcb-util 0.4.0 hd590300_1 conda-forge xcb-util-image 0.4.0 h8ee46fc_1 conda-forge xcb-util-keysyms 0.4.0 h8ee46fc_1 conda-forge xcb-util-renderutil 0.3.9 hd590300_1 conda-forge xcb-util-wm 0.4.1 h8ee46fc_1 conda-forge xkeyboard-config 2.40 hd590300_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.7 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xf86vidmodeproto 2.3.1 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zipp 3.17.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 hd590300_5 conda-forge zstd 1.5.5 hfc55251_0 conda-forge

samhorsfield96 commented 9 months ago

Hi, sorry to hear you had this issue. I have done some troubleshooting and have changed what appears to be the likely issue. However, as I have not reproduced your error, I'm not 100% sure if it will work. If possible, please use the oor_error branch, commit 62e35aa and let me know if this rectifies the problem.

aababc1 commented 9 months ago

Thank you for your prompt reply.

You mentioned that you cannot reproduce error, can you tell me the detail environment(tools and their version ) of ggcaller execution ?

I read your reply and adapt your instruction by switching the graph.cpp file. As your expectation issue seems to be resolved in Scoring ORF clusters step . .

In the core genome alignment step, issue was generated. I allocated 1terabytes of memory to do ggcaller job.

Is this error caused by memory shortage or do I have to adjust code or execution environment. Thank you very much.

$ time ggcaller --refs 1380list --clean-mode moderate --alignment core --core-threshold 0.9 --threads 80 --out ggcallertest& [1] 63929 Building coloured compacted DBG... Generating graph stop codon index... Mapping contigs to graph... Loading gene models... Traversing graph to identify ORFs... |██████████████████████████████████████████████████| 100% Generating clusters of high-scoring ORFs... Scoring ORF clusters... |██████████████████████████████████████████████████| 100% Identifying high-scoring ORFs... |██████████████████████████████████████████████████| 100% Generating initial network... Processing paralogs... 100%|████████████████████████████████████████████████████████████████████████| 365/365 [01:17<00:00, 4.71it/s] collapse mistranslations... Processing depth: 1 Iteration: 1 100%|████████████████████████████████████████████████████████████████| 219596/219596 [00:52<00:00, 4202.88it/s] Iteration: 2 100%|██████████████████████████████████████████████████████████████████| 13941/13941 [00:07<00:00, 1826.03it/s] Iteration: 3 100%|████████████████████████████████████████████████████████████████████| 8828/8828 [00:02<00:00, 3041.83it/s] Iteration: 4 100%|████████████████████████████████████████████████████████████████████| 1801/1801 [00:00<00:00, 2944.91it/s] Iteration: 5 100%|██████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 2469.51it/s] Iteration: 6 100%|██████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 1694.07it/s] Iteration: 7 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 44150.57it/s] Processing depth: 2 Iteration: 1 100%|████████████████████████████████████████████████████████████████| 151200/151200 [00:31<00:00, 4735.54it/s] Iteration: 2 100%|████████████████████████████████████████████████████████████████████| 2952/2952 [00:02<00:00, 1281.83it/s] Iteration: 3 100%|████████████████████████████████████████████████████████████████████████| 65/65 [00:00<00:00, 1112.43it/s] Iteration: 4 100%|██████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1045.90it/s] Iteration: 5 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1272.16it/s] Processing depth: 3 Iteration: 1 100%|████████████████████████████████████████████████████████████████| 147706/147706 [01:40<00:00, 1466.66it/s] Iteration: 2 100%|███████████████████████████████████████████████████████████████████████| 508/508 [00:01<00:00, 322.17it/s] Iteration: 3 100%|████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 1053.00it/s] Iteration: 4 100%|███████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 398.27it/s] Iteration: 5 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 182.08it/s] annotating gene families... collapse gene families... Processing depth: 1 Iteration: 1 100%|████████████████████████████████████████████████████████████████| 147151/147151 [00:58<00:00, 2528.31it/s] Iteration: 2 100%|███████████████████████████████████████████████████████████████████| 12511/12511 [00:15<00:00, 805.53it/s] Iteration: 3 100%|█████████████████████████████████████████████████████████████████████| 7169/7169 [00:10<00:00, 689.31it/s] Iteration: 4 100%|█████████████████████████████████████████████████████████████████████| 3739/3739 [00:05<00:00, 669.09it/s] Iteration: 5 100%|█████████████████████████████████████████████████████████████████████| 1584/1584 [00:02<00:00, 692.07it/s] Iteration: 6 100%|███████████████████████████████████████████████████████████████████████| 696/696 [00:00<00:00, 803.60it/s] Iteration: 7 100%|███████████████████████████████████████████████████████████████████████| 417/417 [00:00<00:00, 904.32it/s] Iteration: 8 100%|███████████████████████████████████████████████████████████████████████| 248/248 [00:00<00:00, 961.59it/s] Iteration: 9 100%|██████████████████████████████████████████████████████████████████████| 169/169 [00:00<00:00, 1092.54it/s] Iteration: 10 100%|█████████████████████████████████████████████████████████████████████████| 95/95 [00:00<00:00, 976.12it/s] Iteration: 11 100%|████████████████████████████████████████████████████████████████████████| 60/60 [00:00<00:00, 1039.85it/s] Iteration: 12 100%|████████████████████████████████████████████████████████████████████████| 34/34 [00:00<00:00, 1042.58it/s] Iteration: 13 100%|████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 1077.94it/s] Iteration: 14 100%|█████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 716.77it/s] Iteration: 15 100%|███████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 866.35it/s] Iteration: 16 100%|███████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 836.60it/s] Iteration: 17 100%|██████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1093.83it/s] Iteration: 18 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1357.16it/s] Iteration: 19 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1376.76it/s] Iteration: 20 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2798.07it/s] Iteration: 21 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 53092.46it/s] Processing depth: 2 Iteration: 1 100%|██████████████████████████████████████████████████████████████████| 74105/74105 [00:53<00:00, 1390.57it/s] Iteration: 2 100%|█████████████████████████████████████████████████████████████████████| 3139/3139 [00:15<00:00, 208.53it/s] Iteration: 3 100%|█████████████████████████████████████████████████████████████████████| 1240/1240 [00:05<00:00, 231.43it/s] Iteration: 4 100%|███████████████████████████████████████████████████████████████████████| 592/592 [00:02<00:00, 246.76it/s] Iteration: 5 100%|███████████████████████████████████████████████████████████████████████| 320/320 [00:01<00:00, 274.44it/s] Iteration: 6 100%|███████████████████████████████████████████████████████████████████████| 177/177 [00:00<00:00, 348.09it/s] Iteration: 7 100%|█████████████████████████████████████████████████████████████████████████| 94/94 [00:00<00:00, 382.28it/s] Iteration: 8 100%|█████████████████████████████████████████████████████████████████████████| 57/57 [00:00<00:00, 401.23it/s] Iteration: 9 100%|█████████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 457.70it/s] Iteration: 10 100%|█████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 198.74it/s] Iteration: 11 100%|█████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 332.15it/s] Iteration: 12 100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 738.30it/s] Iteration: 13 100%|██████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 1249.74it/s] Iteration: 14 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 36.64it/s] Iteration: 15 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 35.03it/s] Processing depth: 3 Iteration: 1 100%|███████████████████████████████████████████████████████████████████| 66810/66810 [07:35<00:00, 146.74it/s] Iteration: 2 100%|██████████████████████████████████████████████████████████████████████| 2454/2454 [01:55<00:00, 21.27it/s] Iteration: 3 100%|████████████████████████████████████████████████████████████████████████| 683/683 [00:21<00:00, 31.81it/s] Iteration: 4 100%|████████████████████████████████████████████████████████████████████████| 257/257 [00:07<00:00, 36.10it/s] Iteration: 5 100%|████████████████████████████████████████████████████████████████████████| 114/114 [00:03<00:00, 37.92it/s] Iteration: 6 100%|██████████████████████████████████████████████████████████████████████████| 46/46 [00:00<00:00, 89.01it/s] Iteration: 7 100%|██████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 35.47it/s] Iteration: 8 100%|████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 67.23it/s] Iteration: 9 100%|████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 21.07it/s] Iteration: 10 100%|███████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 157.65it/s] Iteration: 11 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 257.47it/s] trimming contig ends... refinding genes... Number of searches to perform: 17245438 Searching... translating hits... Updating output... Number of refound genes: 824980 collapse gene families with refound genes... Processing depth: 1 Iteration: 1 100%|██████████████████████████████████████████████████████████████████| 35959/35959 [00:17<00:00, 2074.54it/s] Iteration: 2 100%|█████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 423.63it/s] Processing depth: 2 Iteration: 1 100%|███████████████████████████████████████████████████████████████████| 35925/35925 [01:06<00:00, 539.30it/s] Iteration: 2 100%|████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 26.29it/s] Processing depth: 3 Iteration: 1 100%|████████████████████████████████████████████████████████████████████| 35916/35916 [09:06<00:00, 65.72it/s] Iteration: 2 100%|██████████████████████████████████████████████████████████████████████████| 81/81 [00:05<00:00, 13.97it/s] Iteration: 3 100%|████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.42it/s] Iteration: 4 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6.09it/s] writing Roary output... writing GFF files... writing gene fasta... generating core genome MSAs... Exception in thread Thread-1: Traceback (most recent call last): File "miniconda3/envs/ggcaller134/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "miniconda3/envs/ggcaller134/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/pool.py", line 513, in _handle_workers cls._maintain_pool(ctx, Process, processes, pool, inqueue, File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/pool.py", line 337, in _maintain_pool Pool._repopulate_pool_static(ctx, Process, processes, pool, File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/popen_fork.py", line 66, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

samhorsfield96 commented 9 months ago

Hi, this looks like it's an issue with python multiprocessing. I would suggest running with fewer (~40) threads.

aababc1 commented 9 months ago

Thank you for your reply.

I found out that when ggcaller is terminated. the jobs remaining S in job queue consuming memory.

image

Memory usage is rolled back when ggcaller jobs in sleeping were terminated.

OOM(out of memory ) error cause abrupt termination of program (making jobs sleeping ) or , dose it intended action for rerunning the sleeping jobs? regarding this error and real adoption of ggcaller, I got 3 questions.

I look forward your answer. Thank you very much.

=============================== time ggcaller --refs 1380list --clean-mode moderate --alignment core --core-threshold 0.9 --threads 40 --out ggcallertest Building coloured compacted DBG... Generating graph stop codon index... Mapping contigs to graph... Loading gene models... Traversing graph to identify ORFs... |██████████████████████████████████████████████████| 100% Generating clusters of high-scoring ORFs... Scoring ORF clusters... |██████████████████████████████████████████████████| 100% Identifying high-scoring ORFs... |██████████████████████████████████████████████████| 100% Traceback (most recent call last): File "miniconda3/envs/ggcaller134/bin/ggcaller", line 33, in sys.exit(load_entry_point('ggCaller==1.3.4', 'console_scripts', 'ggcaller')()) File "miniconda3/envs/ggcaller134/lib/python3.9/site-packages/ggCaller-1.3.4-py3.9-linux-x86_64.egg/ggCaller/main.py", line 506, in main with SharedMemoryManager() as smm: File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/managers.py", line 1327, in init resource_tracker.ensure_running() File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/resource_tracker.py", line 121, in ensure_running pid = util.spawnv_passfds(exe, args, fds_to_pass) File "miniconda3/envs/ggcaller134/lib/python3.9/multiprocessing/util.py", line 452, in spawnv_passfds return _posixsubprocess.fork_exec( OSError: [Errno 12] Cannot allocate memory

real 654m14.758s

samhorsfield96 commented 9 months ago

Hi, to answer your questions:

  1. The sleeping jobs should be terminated upon exiting and are not used to restart the workflow. This is an issue we suspect is down to python shared memory during multiprocessing. Unfortunately, I'm not sure I understand your query about the 'real' execution environment - you can check library versions using mamba info --envs.
  2. Due to python multiprocessing we do expect some copying of objects, so yes, memory will increase with the number of threads. We show in the paper we can run ggCaller on up to 3000 closely related genomes, we have not tried with 5,000 genomes. Alternatively, you could identify representative genomes using PopPUNK, and generate a pangenome from these representatives using ggCaller. This will provide a representative sample of the pangenome and gene frequencies.
  3. We have not tried with MAGs but I don't see why this would be a problem for ggCaller. Using Panaroo, ggCaller is able to identify potential pseudogenes and remove open reading frame calls at the ends of contigs that are like incorrect. Furthermore, ggCaller attempts to remedy gene truncation due to assembly errors by enabling traversal across contig breaks, which will aid in returning more intact genes. I would suggest running ggCaller in --clean-mode strict to enable stringent prediction filtering.
aababc1 commented 9 months ago

Hi, Thank you for your kind answer. I get your suggestions and explanations. Thank for the details.

Couple of things to mention regarding the last post. At first, I requested "your" ggcaller execution environment.

The conda installation of ggcaller 1.3.4 version is not runnable in our server (Centos7.9). So, I proceeded manual source compilation of your tool by git cloning and python setup following your instruction on github documentation.

But there something would be good if things are clarified. In the file https://github.com/samhorsfield96/ggCaller/blob/master/environment_linux.yml , the versions of python packages are not explicitly represented. In my experience, conda tries to download recent version packages if versions are not specified elsewhere.

I've encountered many errors due to numpy, pandas or etc while utilizing several other tools. (numpy <1.24 , numpy >= 1.24 , usage of different package version leads to run error. ) . Because the conda installation did not work for me (new environment creation for ggcaller by mamba create -n ~~~~ ,following the code on github), and you said you could not reproduce such errors. I requested your exact execution environment (python, numpy, pandas, networkx ~~ etc) of ggcaller to find out any potential incompatibility.

If you don't mind, I want to know your packages and tools version related with ggcaller. I think specifying versions of package would resolve errors regarding conda installation .

Thank you very much for your quick response and your valuable answers, I'll look into it.

samhorsfield96 commented 7 months ago

Hi, we have found explicitly specifying versions for all packages can cause issues with incompatibility, therefore we choose only to do this in specific cases where it is warranted.

Please see the attached file for package versions for a working ggCaller environment. ggc_env_v1.3.4_packages.txt

aababc1 commented 7 months ago

Hi. Thank you. I got your point.

Best regards and many thanks.

samhorsfield96 commented 7 months ago

Closed as resolved.