bwa-mem2 / bwa-mem2

The next version of bwa-mem
Other
714 stars 97 forks source link

Feature request: Limit memory usage during index step. #118

Open rhpvorderman opened 3 years ago

rhpvorderman commented 3 years ago

Hi!

First of all, thanks for the latest bwa-mem2 2.1 release. It works great. The reduced memory usage is fantastic. It allowed me to run benchmarks locally as alignment on 8 threads for hg38 + alt + decoy sequences used only 19 GB. Also this means that performance is less susceptible to Non-Uniform Memory Access (NUMA) which is a problem on multi-socket servers. Less memory means better performance! The used sequence can be found here and was 3,1GB.

The indexing step however still takes about 80G. This means it was not possible to run it locally. Since I have access to a compute cluster, this was not a problem for me, as I could transfer the index to my local (32GB RAM) machine after building it. However this is not possible for people who do not have access to a compute cluster.

Would there be some low-hanging to reduce the 28 x <size_of_reference requirement? The indexing runtime was very good, just 50 minutes. But since this is a step that is only run once, I think a lot of people can be made happy with a doubling of the runtime if the memory can be halved. It will make using bwa-mem2 more accessible for institutions that do their compute workload on workstation class PCs instead of having a compute cluster.

WANGchuang715 commented 3 years ago

I also encountered the same problem. The memory consumption is still too large for indexing large genomes. Hope it can be resolved.

yuk12 commented 3 years ago

@rhpvorderman @WANGchuang715 we are looking into this, hopefully, we will have a solution soon.

atongsa commented 3 years ago
yuk12 commented 3 years ago

You used an older release. Please try with the latest release v2.1 and check. The index size is reduced in the latest release. We are looking into reducing the memory requirement during indexing.

andreaswallberg commented 3 years ago

I am bitten by this issue too. I can not index my 18Gbp genome assembly on a 512GB RAM node.

millerh1 commented 3 years ago

Big +1 for this feature :) bwa-mem2 is awesome -- would love to be able to use it for my use case but not enough RAM at the moment

ZhouQiangwei commented 3 years ago

meet this issue too, with version 2.2.1.

binary seq ticks = 361627464193 Allocation of 92.04 GB for suffix_array failed. Current Allocation = 103.54 GB