Open linda5mith opened 9 months ago
Tried running meshclust with smallest batch size but am still getting segmentation error:
/home/administrator/programs/Identity/bin/meshclust -d pooled_genomes.fasta -o meshclust.clstr -t 0.9 -c 8 -e y -a n -p 10 -b 1000 -v 1000
MSE: 0.000172281
Optimizing ...
Validating ...
MAE: 0.00986641
MSE: 0.000248818
Clustering ...
Data run 1 ...
Processed sequences: 1000
Unprocessed sequences: 0
Found centers: 55
Segmentation fault (core dumped)
Is there any way I can get this working on my system?
I have the similar problem, but in a different step:
./identity -d (gunzip -c crc_phages/crc_phages.mvirs.fa.gz | psub) -t 0.7 -o output.txt -c 8
Identity 1.2 is developed by Hani Z. Girgis, PhD.
This program calculates DNA sequence identity scores rapidly without alignment.
Copyright (C) 2020 Hani Z. Girgis, PhD
Academic use: Affero General Public License version 1.
Any restrictions to use for profit or non-academics: Alternative commercial license is required.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.
Please cite the following papers:
Identity: Rapid alignment-free prediction of sequence alignment identity scores using
self-supervised general linear models (2021). Hani Z. Girgis, Benjamin T. James, and
Brian B. Luczak. NAR Genom Bioinform, 13(1), lqab001.
A survey and evaluations of histogram-based statistics in alignment-free sequence
comparison (2019). Brian B. Luczak, Benjamin T. James, and Hani Z. Girgis. Briefings
in Bioinformatics, 20(4):1222–1237.
Database file: /tmp/.psub.TOS7qnPQj6
Query file: Not provided
Output file: output.txt
Cores: 8
Threshold: 0.7
Automatically relax threshold: Yes
All vs. all: Yes
Average: 14000
K: 6
Histogram size: 4096
A histogram entry is 32 bits.
Generating data.
fish: Job 1, './identity -d (gunzip -c /nfs/n…' terminated by signal SIGSEGV (Address boundary error)
Hi there, I'm trying to run and all vs. all of around 10k sequences which are on average 100kb in length.
I've tried running with various combinations of -c 8 or not specifying any cores (server has 24 cores) but keep getting the error "Segmentation fault (core dumped)" each time.
~/programs/Identity/bin/identity -d pooled_genomes.fasta -o output.txt -t 0.9 -c 8
I'm not sure what I'm doing wrong. Any help would be greatly appreciated! :)