chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
272 stars 59 forks source link

Reduce peak memory usage #167

Closed bagashe closed 4 years ago

bagashe commented 4 years ago

DisjointSets DisjointSets::find is guaranteed to return the correct set representative. However, it is not guaranteed that the parent information (lower 64 bits) will have the set representative stored in it when it is done. This is because the __sync_bool_compare_and_swap could fail because of race conditions. I'm introducing a new data member - trackParentUpdated. When trackParentUpdated is set to True, we bump a counter that tracks the number of entries that don't have the set representative populated in their lower 64 bits.

In a highly parallel environment, where N threads are calling find on different entries in DisjointSets, it is very likely that several entries won't have the set representative populated as the parent, at the end of the pass.

If we do multiple such passes over all entries, then within a few iterations, the parent information in all entries (lower 64 bits) will have the correct set representative. this->parentUpdated can be used to track this convergence.

createMarkerGraphVertices Storing N markers requires 8N bytes (64 bit integers). The DisjointSets data structure requires an additional 64 bits per marker. We were allocating 16N bytes for DisjointSets and then another 8N bytes to find and store the results from DisjointSets. This seemed wasteful. So now we compact the memory allocated to DisjointSets in-place. For a typical human genome assembly this means we avoid allocating around 145 GB. This lops off the current peak of Shasta's memory usage.

In order to do this, we do multiple passes of calling find on each entry in DisjointSets. This is feasible because, we typically need only 2-3 passes and each pass is quite fast.

Performance Impact For HG002-Guppy-3.6.0, these changes reduced peak memory usage by 101 GB (14.11%). at the cost of adding about 3 minutes to the assembly time. This is a fantastic trade-off as Shasta is plenty fast, but constrained by its memory use.

Test Plan Compared the assembly of HG002-Guppy-3.6.0 reads before and after this change. Used QUAST to verify that assembly quality was unchanged.

paoloczi commented 4 years ago

A very fine piece of work. Thank you @bagashe!