DisjointSetsDisjointSets::find is guaranteed to return the correct set representative. However, it is not guaranteed
that the parent information (lower 64 bits) will have the set representative stored in it when it is done.
This is because the __sync_bool_compare_and_swap could fail because of race conditions.
I'm introducing a new data member - trackParentUpdated.
When trackParentUpdated is set to True, we bump a counter that tracks the number
of entries that don't have the set representative populated in their lower 64 bits.
In a highly parallel environment, where N threads are calling find on different
entries in DisjointSets, it is very likely that several entries won't have the
set representative populated as the parent, at the end of the pass.
If we do multiple such passes over all entries, then within a few iterations, the parent
information in all entries (lower 64 bits) will have the correct set representative.
this->parentUpdated can be used to track this convergence.
createMarkerGraphVertices
Storing N markers requires 8N bytes (64 bit integers). The DisjointSets data structure requires an additional 64 bits per marker.
We were allocating 16N bytes for DisjointSets and then another 8N bytes to find and store the results from DisjointSets. This seemed wasteful. So now we compact the memory allocated to DisjointSets in-place. For a typical human genome assembly this means we avoid allocating around 145 GB. This lops off the current peak of Shasta's memory usage.
In order to do this, we do multiple passes of calling find on each entry in DisjointSets. This is feasible because, we typically need only 2-3 passes and each pass is quite fast.
Performance Impact
For HG002-Guppy-3.6.0, these changes reduced peak memory usage by 101 GB (14.11%). at the cost of adding about 3 minutes to the assembly time. This is a fantastic trade-off as Shasta is plenty fast, but constrained by its memory use.
Test Plan
Compared the assembly of HG002-Guppy-3.6.0 reads before and after this change. Used QUAST to verify that assembly quality was unchanged.
DisjointSets
DisjointSets::find
is guaranteed to return the correct set representative. However, it is not guaranteed that the parent information (lower 64 bits) will have the set representative stored in it when it is done. This is because the__sync_bool_compare_and_swap
could fail because of race conditions. I'm introducing a new data member -trackParentUpdated
. WhentrackParentUpdated
is set to True, we bump a counter that tracks the number of entries that don't have the set representative populated in their lower 64 bits.In a highly parallel environment, where N threads are calling
find
on different entries in DisjointSets, it is very likely that several entries won't have the set representative populated as the parent, at the end of the pass.If we do multiple such passes over all entries, then within a few iterations, the parent information in all entries (lower 64 bits) will have the correct set representative.
this->parentUpdated
can be used to track this convergence.createMarkerGraphVertices Storing N markers requires 8N bytes (64 bit integers). The DisjointSets data structure requires an additional 64 bits per marker. We were allocating 16N bytes for DisjointSets and then another 8N bytes to find and store the results from DisjointSets. This seemed wasteful. So now we compact the memory allocated to DisjointSets in-place. For a typical human genome assembly this means we avoid allocating around 145 GB. This lops off the current peak of Shasta's memory usage.
In order to do this, we do multiple passes of calling
find
on each entry in DisjointSets. This is feasible because, we typically need only 2-3 passes and each pass is quite fast.Performance Impact For
HG002-Guppy-3.6.0
, these changes reduced peak memory usage by 101 GB (14.11%). at the cost of adding about 3 minutes to the assembly time. This is a fantastic trade-off as Shasta is plenty fast, but constrained by its memory use.Test Plan Compared the assembly of
HG002-Guppy-3.6.0
reads before and after this change. Used QUAST to verify that assembly quality was unchanged.