bioinfologics / satsuma2

FFT cross-correlation based synteny aligner, (re)designed to make full use of parallel computing
41 stars 13 forks source link

Segmentation fault error #55

Open sjellerstrand opened 2 years ago

sjellerstrand commented 2 years ago

Hello! I am trying to run satsuma2 and get the following error message:

... Chaining (inline)... Filling out repeat lists... Dynprog'ing... /var/spool/slurmd/job27515260/slurm_script: line 46: 15205 Segmentation fault (core dumped) SatsumaSynteny2 -t Bter.fa -q Bmus.fa -o satsuma_Bter_Bmus -threads 20

I am running this on a cluster which uses a version of satsuma2 defined as "2016-12-07". I have been trying to run this twice now on a full node with 20 cores and 16 GB RAM per core. This has not proven to be an issue before, neither with reference genomes of completely different organisms, nor using the same target genome (Bter.fa) for another query. Actually, the only difference for the current situation is that I have applied new variants to the previous genome (that worked with satsuma2) and generated a new consensus genome (which doesn't work, Bmus.fa).

I get the following output files before the program aborts:

-rw-rw-r-- 1 simonj snic2020-2-25 864 May 26 08:35 kmatch_results.k11 -rw-rw-r-- 1 simonj snic2020-2-25 95890680 May 26 08:35 kmatch_results.k13 -rw-rw-r-- 1 simonj snic2020-2-25 724673952 May 26 08:36 kmatch_results.k15 -rw-rw-r-- 1 simonj snic2020-2-25 572481072 May 26 08:38 kmatch_results.k17 -rw-rw-r-- 1 simonj snic2020-2-25 318549312 May 26 08:39 kmatch_results.k19 -rw-rw-r-- 1 simonj snic2020-2-25 233698032 May 26 08:40 kmatch_results.k21 -rw-rw-r-- 1 simonj snic2020-2-25 201243384 May 26 08:41 kmatch_results.k23 -rw-rw-r-- 1 simonj snic2020-2-25 179523648 May 26 08:42 kmatch_results.k25 -rw-rw-r-- 1 simonj snic2020-2-25 161807256 May 26 08:43 kmatch_results.k27 -rw-rw-r-- 1 simonj snic2020-2-25 146990376 May 26 08:44 kmatch_results.k29 -rw-rw-r-- 1 simonj snic2020-2-25 133749504 May 26 08:46 kmatch_results.k31 -rw-rw-r-- 1 simonj snic2020-2-25 9 May 26 08:34 satsuma.log

Any clue on what might be going on and how I can fix this issue? I have attached the output from my most recent run.

Thank you!

Regards Simon

satsuma_slurm_output.txt

hannahdevens commented 1 year ago

Hi, I'm also getting this same error after K31. Did you ever figure out a solution?

sjellerstrand commented 1 year ago

Hey! I did find a solution, yes. This problem does not arise if I run the code on the same 20 cores without specifying "-threads 20" (default is 1). The job will however take a lot longer to run. As I also mentioned these settings are not always an issue, but I have collegues who ran into the same problem and solved it the same wy. Best of luck!

Best Simon

hannahdevens commented 1 year ago

Oh okay, I'll try that--thanks!

kushalsuryamohan commented 1 year ago

Hi @sjellerstrand and @hannahdevens I am facing the same issue (works for most genomes but for some, I always struggle to find the right balance of resources to generate the outputs). Here is my command (I am running on a single node. OS: CentOS Linux 7, 104 cores, ~ 791 GB RAM. Python: 3.8.15):

#!/bin/sh
##SBATCH -N 1
#SBATCH -n 24
#SBATCH --mem=500G 
export SATSUMA2_PATH=/home/kushal.s/Satsuma/product/bin
SatsumaSynteny2 -t /home/kushal.s/satsuma_genomes/genome_10000000bp_longer.fasta -q /home/kushal.s/temp/Westernterrestrialgartersnake.fasta -o . -sl_mem 12 -slaves 4 -threads 4 -km_mem 12

The target and query genomes are near-chromosomal reptilian genomes (~1.5-1.8 Gb in size).

I've tried removing the threads and slaves options but that just takes forever only to result in a segmentation fault. Given the hardware specs, what would be a logical command to ensure this completes?

See the error I get in the KM23.log file

srun: error: Unable to create step for job 637656: More processors requested than permitted

However, I do see the file kmatch_results.k27.finished.

After some point though, I get a segmentation fault error.

@jonwright99 I'm not sure how to solve this and I've spent way too much time without any success so would appreciate some help!

Thanks in advance!

kushalsuryamohan commented 1 year ago

As an update, I tried with no threads or slaves specifications and also tried reducing the slaves and threads (see below) but still get the segmentation fault error.

/var/spool/slurm/d/job637643/slurm_script: line 25: 262361 Segmentation fault (core dumped) SatsumaSynteny2 -t /home/kushal.s/satsuma_genomes/genome_10000000bp_longer.fasta -q /home/kushal.s/temp/Westernterrestrialgartersnake.fasta -o . -sl_mem 8 -slaves 1 -threads 8 -km_mem 8

And here is the log dump from SL1.log:

srun: error: Unable to create step for job 637660: More processors requested than permitted

hannahdevens commented 1 year ago

hi! unfortunately I was never able to make this work. if you figure it out i'd love to hear!

kushalsuryamohan commented 1 year ago

Not sure if this repo is still being monitored by the admins/developers @bjclavijo @jonwright99 but I would really appreciate some help here.