baoxingsong / AnchorWave

Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism and whole-genome duplication variation
MIT License
151 stars 19 forks source link

Segmentation fault with `genoAli` #34

Closed aseetharam closed 2 years ago

aseetharam commented 2 years ago

I'm testing anchorwave with 2 very similar genomes, and I'm stumped with this issue. I've tried increasing the memory, using alternative CPU instruction binary, and increasing the number of threads but none of them worked so far.

anchorwave_sse2 genoAli \
   -i Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 \
   -r Zm-B73-REFERENCE-NAM-5.0.fa -a cds.sam \
   -as cds.fa \
   -ar ref.sam \
   -s Zm-B73_AB10-REFERENCE-NAM-1.0.fa \
   -n anchors \
   -o output.maf \
   -f output.fragmentation.maf \
   -t $SLURM_JOB_CPUS_PER_NODE

and the stdout/stderr:

SSE2 is enabled
reading reference sam begin
reading reference sam done
using parameters detected from the input SAM file for novel anchors identification
Segmentation fault (core dumped)

Any suggestions or ideas on why this is happening? The CDS alignment was generated as per the manual's instructions and are in proper SAM format.

Thanks,

baoxingsong commented 2 years ago

Please pay attention to the memory cost.

Without heavily parameters turning, for highly diverse genomes, using a single thread, AnchorWave uses ~85Gb memory. Increasing a thread would cost an extra ~50Gb memory. If the two genomes have very similar sequences, the time and memory cost would be significantly less.