BioinformaticsToolsmith / Identity

Other
32 stars 3 forks source link

Floating point exception (core dumped) #12

Open cstill3928 opened 1 year ago

cstill3928 commented 1 year ago

Hi Dr. Girgis,

Thanks for this interesting package and I'm really excited about it. I installed Meshclust and could run some of the example data to completion easily (keratin_query.fasta). Unfortunately when I try to run it on my fasta file with sequences of interest (majority of which have >99% similarity), I get the error Floating point exception (core dumped). My DNA sequences range from 1381bps to 1840bps with a majority at 1584bps. Below is the output from my terminal when I run Meshclust, any help would be greatly appreciated it. Thanks!

MeShClust 2.0 is developed by Hani Z. Girgis, PhD.

This program clusters DNA sequences using identity scores obtained without alignment.

Copyright (C) 2021-2022 Hani Z. Girgis, PhD

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.

Please cite the following papers: 
    MeShClust v3.0: High-quality clustering of DNA sequences using the mean shift algorithm
    and alignment-free identity scores (2022). Hani Z. Girgis, BMC Genomics, 23(1):423.

    Identity: Rapid alignment-free prediction of sequence alignment identity scores using
    self-supervised general linear models (2021). Hani Z. Girgis, Benjamin T. James, and
    Brian B. Luczak. NAR Genom Bioinform, 13(1), lqab001.

    A survey and evaluations of histogram-based statistics in alignment-free sequence
    comparison (2019). Brian B. Luczak, Benjamin T. James, and Hani Z. Girgis. Briefings
    in Bioinformatics, 20(4):1222–1237.

    MeShClust: An intelligent tool for clustering DNA sequences (2018). Benjamin T. James,
    Brian B. Luczak, and Hani Z. Girgis. Nucleic Acids Res, 46(14):e83.

Database file: ./consensus_reads.fa
Output file: ./cluster_test.txt
Cores: 96
Estimating the threshold ...
Average: 1585
K: 5
Histogram size: 1024
A histogram entry is 16 bits.
Generating data.
Number of standard deviations: 1
Preparing data ...
    Positive examples: 10000
    Training size: 5000
    Validation size: 5000
Better performance of: 0.00110822
    jeffrey_divergence
Better performance of: 0.000747565
    jeffrey_divergence
    cosine x jeffrey_divergence
Better performance of: 0.000719267
    jeffrey_divergence
    simMM
    cosine x jeffrey_divergence
Better performance of: 0.000681764
    jeffrey_divergence
    simMM
    manhattan x correlation
    cosine x jeffrey_divergence
    correlation x simMM^2
Better performance of: 0.000652853
    jeffrey_divergence
    sim_ratio
    simMM
    manhattan x correlation
    cosine x jeffrey_divergence
    correlation x simMM^2
Selected statistics:
    jeffrey_divergence
    sim_ratio
    simMM
    manhattan x correlation
    cosine x jeffrey_divergence
    correlation x simMM^2
Finished training.
    MAE: 0.0175349
    MSE: 0.000652853
Optimizing ...
Validating ...
    MAE: 0.0219525
    MSE: 0.000956407
Initialization: 0.373589 1
Stopping because there is no change for three iterations: 3
0.374592 0.995013 0.0001 0.9999
Initialization: 0.374592 0.995013 0.005 0.005
Stopping because there is no change for three iterations: 3
0.374592 0.995013 0.00129545 0.00302476 0.0001 0.9999
Initialization: 0.915731 1
Stopping because there is no change for three iterations: 10
0.956302 0.995658 0.0011 0.9989
Initialization: 0.956302 0.995658 0.0170152 0.005
Converged. 
0.995614 0.996173 0.00287971 9.07327e-07 1 0
Initialization: 0.874258 1
Stopping because there is no change for three iterations: 9
0.954592 0.995994 0.00215 0.99785
Initialization: 0.954592 0.995994 0.0202069 0.005
Converged. 
0.995905 0.996071 0.00327823 1.77699e-06 1 0
============================================
0.992735
0.992628
0.991988
Final threshold: 0.992628
Calculated threshold: 0.992628
Block size for all vs. all: 25000
Block size for reading sequences: 100000
Number of data passes: 10
Can assign all: No

Average: 1585
K: 5
Histogram size: 1024
A histogram entry is 16 bits.
Generating data.
Floating point exception (core dumped)
hani-girgis commented 1 year ago

Hi.

Thanks for your interest in MeShClust v3.0.

In order for me to reproduce this error, would you please share your input data (or part of it large enough to produce this error). You may email it to hzgirgis at buffalo.edu

Best regards.

Hani