core dump - Githubissues

Hi there,

Thanks for the tool!

When I tried the meshclust 3.0, I got the core dump error, do you have any suggestions for this? thank you!

The compute-node of the cluster has 56 cores (112 threads), 1.5T RAM, and we did not limit how much RAM the meshclust would like to use.

Best Guanliang

-rw-rw-r-- 1 gmeng 1.5G Jun 13 17:15 combined.fa
-rw-rw-r-- 1 gmeng  112 Jun 14 10:00 meshclust3.sh
-rw-r--r-- 1 gmeng 5.9K Jun 15 20:16 meshclust3.sh.o539214
-rw------- 1 gmeng  18G Jun 15 22:18 core.229599
-rw-r--r-- 1 gmeng  416 Jun 15 22:18 meshclust3.sh.e539214

$ grep -c '>' combined.fa
5652580

meshclust3.sh:

/home/gmeng/soft/MeShClust_v3/Identity/bin/meshclust -d combined.fa -t 0.6  -o out.clstr -c 80 -e y -a n -p 10

meshclust3.sh.o539214:

MeShClust v3.0 is developed by Hani Z. Girgis, PhD.

This program clusters DNA sequences using identity scores obtained without alignment.

Copyright (C) 2021-2022 Hani Z. Girgis, PhD

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.

Please cite the following papers:
    1. Identity: Rapid alignment-free prediction of sequence alignment identity scores using
    self-supervised general linear models. Hani Z. Girgis, Benjamin T. James, and Brian B.
    Luczak. NAR GAB, 3(1):lqab001, 2021.
    2. MeShClust: an intelligent tool for clustering DNA sequences. Benjamin T. James,
    Brian B. Luczak, and Hani Z. Girgis. Nucleic Acids Res, 46(14):e83, 2018.
    3. MeShClust v3.0: High-quality clustering of DNA sequences using the mean shift algorithm
    and alignment-free identity scores. Hani Z. Girgis. A great journal. 2022.

Database file: combined.fa
Output file: out.clstr
Cores: 80
Provided threshold: 0.6
Block size for all vs. all: 25000
Block size for reading sequences: 100000
Number of data passes: 10
Can assign all: No

Average: 756
K: 4
Histogram size: 256
A histogram entry is 16 bits.
Generating data.
Preparing data ...
    Positive examples: 10000
    Training size: 5000
    Validation size: 5000
Better performance of: 0.00324074
    chi_squared x jeffrey_divergence
Better performance of: 0.00278104
    chi_squared x jeffrey_divergence
    chi_squared^2 x d2_s_r^2
Better performance of: 0.00275123
    chi_squared x jeffrey_divergence
    chi_squared^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
Better performance of: 0.00271437
    chi_squared x jeffrey_divergence
    chi_squared^2 x d2_s_r^2
    bray_curtis^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
Better performance of: 0.00266334
    chi_squared x squared_chord
    chi_squared x jeffrey_divergence
    chi_squared^2 x d2_s_r^2
    bray_curtis^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
    kulczynski_2^2 x d2_s_r^2
Better performance of: 0.00263148
    squared_chord
    chi_squared x squared_chord
    chi_squared x jeffrey_divergence
    chi_squared^2 x d2_s_r^2
    bray_curtis^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
    kulczynski_2^2 x d2_s_r^2
Better performance of: 0.00257594
    squared_chord
    chi_squared x squared_chord
    chi_squared x jeffrey_divergence
    hellinger x hellinger^2
    chi_squared^2 x d2_s_r^2
    bray_curtis^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
    kulczynski_2^2 x d2_s_r^2
Better performance of: 0.00249854
    squared_chord
    manhattan x simMM
    chi_squared x squared_chord
    chi_squared x jeffrey_divergence
    hellinger x hellinger^2
    chi_squared^2 x d2_s_r^2
    bray_curtis^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
    kulczynski_2^2 x d2_s_r^2
Selected statistics:
    squared_chord
    manhattan x simMM
    chi_squared x squared_chord
    chi_squared x jeffrey_divergence
    hellinger x hellinger^2
    chi_squared^2 x d2_s_r^2
    bray_curtis^2 x d2_s_r^2
    squared_chord^2 x hellinger^2
    kulczynski_2^2 x d2_s_r^2
Finished training.
    MAE: 0.036734
    MSE: 0.00249854
Optimizing ...
Validating ...
    MAE: 0.0426102
    MSE: 0.00325363

Clustering ...

Data run 1 ...
    Processed sequences: 25000
    Unprocessed sequences: 0
    Found centers: 772
    Processed sequences: 50000
    Unprocessed sequences: 24657
    Found centers: 770
    Processed sequences: 100478
    Unprocessed sequences: 41448
    Found centers: 1278
    Processed sequences: 166024
    Unprocessed sequences: 32518
    Found centers: 2628
    Processed sequences: 206655
    Unprocessed sequences: 27580
    Found centers: 3034
    Processed sequences: 338846
    Unprocessed sequences: 65658
    Found centers: 3620
    Processed sequences: 348903
    Unprocessed sequences: 50307
    Found centers: 4308
    Processed sequences: 414183
    Unprocessed sequences: 67888
    Found centers: 4653
    Processed sequences: 428889
    Unprocessed sequences: 56801
    Found centers: 5147
    Processed sequences: 473924
    Unprocessed sequences: 66571
    Found centers: 5560
    Processed sequences: 591912
    Unprocessed sequences: 101368
    Found centers: 6457
    Processed sequences: 599863
    Unprocessed sequences: 83946
    Found centers: 6943
    Processed sequences: 682732
    Unprocessed sequences: 112078
    Found centers: 7277
    Processed sequences: 694499
    Unprocessed sequences: 97930
    Found centers: 7757
    Processed sequences: 752209
    Unprocessed sequences: 114752
    Found centers: 8067
    Processed sequences: 767163
    Unprocessed sequences: 94407
    Found centers: 8447
    Processed sequences: 867163
    Unprocessed sequences: 141679
    Found centers: 8792
    Processed sequences: 875812
    Unprocessed sequences: 125026
    Found centers: 9248
    Processed sequences: 950986
    Unprocessed sequences: 155363
    Found centers: 9586
    Processed sequences: 962281
    Unprocessed sequences: 137454
    Found centers: 10001
    Processed sequences: 1050620
    Unprocessed sequences: 173768
    Found centers: 10430
    Processed sequences: 1060816
    Unprocessed sequences: 156809
    Found centers: 10884
    Processed sequences: 1138833
    Unprocessed sequences: 189905
    Found centers: 11240
    Processed sequences: 1219898
    Unprocessed sequences: 191996
    Found centers: 12162
    Processed sequences: 1234377
    Unprocessed sequences: 173682
    Found centers: 12615
    Processed sequences: 1328038
    Unprocessed sequences: 210768
    Found centers: 13095
    Processed sequences: 1338108
    Unprocessed sequences: 194114
    Found centers: 13563
    Processed sequences: 1413309
    Unprocessed sequences: 217638
    Found centers: 13916
    Processed sequences: 1426200
    Unprocessed sequences: 203726
    Found centers: 14366
    Processed sequences: 1482720
    Unprocessed sequences: 217439
    Found centers: 14648
    Processed sequences: 1549592
    Unprocessed sequences: 216905
    Found centers: 15453
    Processed sequences: 1566431
    Unprocessed sequences: 205939
    Found centers: 15909
    Processed sequences: 1610994
    Unprocessed sequences: 211989
    Found centers: 16228

meshclust3.sh.e539214:

Mean 1 (mean1) and Mean 2 (mean2) cannot be zeros. Mean 1 is: 0, mean 2 is: 0.226562

terminate called after throwing an instance of 'std::exception'
  what():  std::exception
/opt/gridengine/default/spool/compute-0-0/job_scripts/539214: Zeile 1: 229599 Abgebrochen             (Speicherabzug geschrieben) /home/gmeng/soft/MeShClust_v3/Identity/bin/meshclust -d combined.fa -t 0.6 -o out.clstr -c 80 -e y -a n -p 10

BioinformaticsToolsmith / Identity

core dump #6