DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
478 stars 119 forks source link

Question regarding the runtime for hisat-3n-table #401

Open cahn20 opened 1 year ago

cahn20 commented 1 year ago

Hello,

I had a hisat-3n-table job with using a bam file with around 7.5 million reads take about 10 hours (which was much longer than what I've seen with a lower amount of reads), and have done a quick test using different read numbers:

1 million read input -> ~600 seconds for hisat-3n-table to complete 2 million read input -> ~2,600 seconds for hisat-3n-table to complete 4 million reads input -> ~9,600 seconds for hisat-3n-table to complete 8 million reads input -> ~40,000 seconds for hisat-3n-table to complete

It seems like roughly a four-fold increase in runtime for every two-fold increase in the number of reads input. Is this behavior expected instead of a linear runtime increase? I'm allotting around 20GB of RAM when running the job, so I assuming that wouldn't be the issue here. I'm just wondering if there are more factors that contribute to this increase in runtime than just the number of reads. Thanks.