DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

hisat3n - memory leaks? #315

Closed DaGaMs closed 2 years ago

DaGaMs commented 2 years ago

Hi,

I'm trying to process some methylation-sequencing data with hisat3n, but I am running into occasional memory issues during the alignment phase, and hisat3n dying with

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
(ERR): hisat2-align died with signal 6 (ABRT) 

Interestingly, this seems to only happen on some of my fastq pairs. I've spent the last week trying to figure out if this is due to a particular set of reads, by removing all the reads that were successfully aligned (minus a dozen or so, in case the crash has to do with the last aligned reads) before the crash from the fastq files. However, when I do that, the alignment runs through just fine. Instead, the crash seems to happen somewhat sporadically when aligning a lot of reads. I was logging the used memory (in MB) on the machine while running hisat3n, which produces graphs like this (x axis is seconds since launch):

Screenshot 2021-08-05 at 14 54 29 Screenshot 2021-08-05 at 14 54 05

FYI, I'm using the 036ffebf7c7eafd73ae534501ece0e355f22bd04 build. The test run above used 8 cores:

hisat-3n -x /path/to/ref/hg38_full_gatk_HPV_HBV_HCV_spike-ins_dbSNP144 \
        -1 test_1.fq.gz \
        -2 test_2.fq.gz \
        --rg-id $ID \
        --rg "SM:$SAMPLE" \
        --rg "LB:$LIBRARY" \
        --rg "PL:$PL" \
        --rg $PU \
        --rg "CN:$CN" \
        -p 8 \
        --base-change C,T \
        --new-summary \
        --summary-file test.summary.txt \
| samtools sort -@ 8 -m 1G -o test.sorted.bam -T test.tmp -O bam --reference /path/to/ref/hg38_full_gatk_HPV_HBV_HCV_spike-ins.fa.gz

I'm out of ideas what I could try to fix this. I wasn't able to come up with an easy to reproduce scenario for this issue, but it's happening with half of our samples. Unfortunately, as this is human patient data, I can't provide you with the raw data for testing. Any suggestions what to try to narrow down the cause?

imzhangyun commented 2 years ago

Hello Benjamin,

Thank you for using HISAT-3N. Your problem may be caused by the splice junction bug in HISAT2 . We already identify the bug and we are fixing it. To align your reads with the current HISAT-3N, please add '--no-temp-splicesite' option. By the way, we highly recommend you pull the newest HISAT-3N code and compile it again. We fix some minor bugs recently.

Please let me know if you have any other question.

Best, Yun (Leo)

DaGaMs commented 2 years ago

It seems the --no-temp-splicesite option did indeed make it run through fine. Thanks!