DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 112 forks source link

Hisat3n | Long mapping times #347

Open ezecalvo opened 2 years ago

ezecalvo commented 2 years ago

Hi,

I'm trying to map reads with hisat3n. Right now is taking >7 days to map ~30M reads. I'm using 20 nodes with 20gb of memory each and memory never reaches the limit.

My index is a normal HG38 genome with splice sites information. I don't see any memory leaks as referenced in other questions. Even with or without --no-temp-splicesite the mapping time is the same.

Are these mapping times expected? Is there something I'm missing to improve speed?

Here's an example of what I normally run: hisat-3n -x $ref_genome -q -1 reads.R1.fq.gz -2 reads.R2.fq.gz -S file.out_TC.sam --base-change T,C -p 20 --rna-strandness RF --no-temp-splicesite

Thanks!

**EDIT: After running some more things I realized --no-temp-splicesite reduces mapping times from ~10 days to ~3.