BioinformaticsToolsmith / Identity

32 stars 3 forks source link

Segmentation fault #20

Open linda5mith opened 9 months ago

linda5mith commented 9 months ago

Hi there, I'm trying to run and all vs. all of around 10k sequences which are on average 100kb in length.

I've tried running with various combinations of -c 8 or not specifying any cores (server has 24 cores) but keep getting the error "Segmentation fault (core dumped)" each time.

~/programs/Identity/bin/identity -d pooled_genomes.fasta -o output.txt -t 0.9 -c 8

I'm not sure what I'm doing wrong. Any help would be greatly appreciated! :)

linda5mith commented 9 months ago

Tried running meshclust with smallest batch size but am still getting segmentation error:

/home/administrator/programs/Identity/bin/meshclust -d pooled_genomes.fasta -o meshclust.clstr -t 0.9 -c 8 -e y -a n -p 10 -b 1000 -v 1000
  MSE: 0.000172281
Optimizing ...
Validating ...
        MAE: 0.00986641
        MSE: 0.000248818

Clustering ... 

Data run 1 ...
        Processed sequences: 1000
        Unprocessed sequences: 0
        Found centers: 55
Segmentation fault (core dumped)

Is there any way I can get this working on my system?

valentynbez commented 6 months ago

I have the similar problem, but in a different step:

./identity -d (gunzip -c crc_phages/crc_phages.mvirs.fa.gz | psub) -t 0.7 -o output.txt -c 8

Identity 1.2 is developed by Hani Z. Girgis, PhD.

This program calculates DNA sequence identity scores rapidly without alignment.

Copyright (C) 2020 Hani Z. Girgis, PhD

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis ( if you need more information.

Please cite the following papers: 
        Identity: Rapid alignment-free prediction of sequence alignment identity scores using
        self-supervised general linear models (2021). Hani Z. Girgis, Benjamin T. James, and
        Brian B. Luczak. NAR Genom Bioinform, 13(1), lqab001.

        A survey and evaluations of histogram-based statistics in alignment-free sequence
        comparison (2019). Brian B. Luczak, Benjamin T. James, and Hani Z. Girgis. Briefings
        in Bioinformatics, 20(4):1222–1237.

Database file: /tmp/.psub.TOS7qnPQj6
Query file: Not provided
Output file: output.txt
Cores: 8
Threshold: 0.7
Automatically relax threshold: Yes
All vs. all: Yes

Average: 14000
K: 6
Histogram size: 4096
A histogram entry is 32 bits.
Generating data.
fish: Job 1, './identity -d (gunzip -c /nfs/n…' terminated by signal SIGSEGV (Address boundary error)