Open grondo opened 1 year ago
I think the inability to set the maximum id is addressed by this change, FWIW:
diff --git a/src/common/libidset/idset.c b/src/common/libidset/idset.c
index d7ea5538f..85975e713 100644
--- a/src/common/libidset/idset.c
+++ b/src/common/libidset/idset.c
@@ -117,6 +117,8 @@ static int idset_grow (struct idset *idset, size_t size)
while (newsize < size)
newsize <<= 1;
+ if (newsize > IDSET_INVALID_ID)
+ newsize = IDSET_INVALID_ID;
if (newsize > idset->T.M) {
if (!(idset->flags & IDSET_FLAG_AUTOGROW)) {
idset_encode()
walks through comma-separated ranges, processing each one sequentially. For each one the veb trie is reallocated to the size of the max id of the range (if necessary), and then idset_range_set()
iterates over the range setting bits. This turns out to be poorly optimized for the case described since idset_range_set()
ends up interating over the entire trie.
One idea for an optimization might be to check all the ranges first, and if there are more bits set than clear, then instead of the above, create a trie with all bits initially set, and walk the holes in the ranges, clearing bits.
This turns out to be poorly optimized for the case described since idset_range_set() ends up interating over the entire trie.
I'm possibly misunderstanding, but the testcase is only setting a single, albeit large, integer in the idset, so only one bit should be set as a result.
Edit: still, your idea seems solid to me.
Oh jeez, how'd I miss that. Sorry for the confusion!
Part of the problem may be this documented property of the vEB tree implementation:
has space efficiency comparable to a bitmap
I'm seeing a lot of time spent in idset_grow()
.
Not that this is important now, but I wonder if looking into whether there is some other data structure with similar properties of a vEB tree but better space efficiency would be useful in the future.
The Similar Data Structures section of the van Emde Boas wikipedia article states:
The O(M) space usage of vEB trees is an enormous overhead unless a large fraction of the universe of keys is being stored. This is one reason why vEB trees are not popular in practice.
Though I assume the implementation in flux-core has some of the suggested mitigations since I don't see 2G required to store IDSET_INVALID_ID - 1
(though it is pretty large ~500K to store one id), a suggested alternative was the y-fast trie which uses O(n) space, though I could not find a C implementation of it (and it sounds complicated).
Another alternative might be a compressed bitmap (though I'm unsure how well those implement predecessor/successor). Here's a C implementation of RoaringBitmap (silly name): https://github.com/RoaringBitmap/CRoaring
CRoaring looks like it has a compatible license and can be pulled in via their "amalgomated release". On the surface, its api would seem to be a usable replacement for libveb so for fun we could switch the libidset implementation over and see how it goes.
FWIW I did take the first step and pulled it in in my issue#5287
branch, specifically this commit: efc0aa716f4126f4b06168c996d53971929428de.
Edit: hmm, it appears that CRoaring requires C11 but at this time Flux is only requiring C99
We might want to take another look at this.
With large resource sets flux resource
now appears to be bound by the performance of libidset, particularly the idset set operations which seem to be quite slow because they use iteration.
I did some quick comparisons between the roaring bitmap implementation and the current idset implementation.
idset | roaring | |
---|---|---|
Set 16K individual bits | 567.593ms | 126.163ms |
Set 16K bits as range | 303.266ms | 0.038ms |
iterate 16K bits | 1299.053ms | 0.107ms |
test bit | 0.130ms | 0.041ms |
set difference (16K - 10K) | 3575.379ms | 0.270ms |
(to clarify, the above benchmarks were taken by iterating the tested operation 100 or 1000 times, depending on speed of the op)
What I haven't done is compare memory usage of roaring bitmaps, however I'm guessing the memory efficiency is much better than a vEB tree.
Also interesting is that roaring has a 64bit implementation. This might be useful in the future, e.g. this would be a very fast way to keep a set of jobids..
Perhaps we can consider requiring C11 to build flux-core to get this improvement.
Oh wow, nice results! Hmm, let me peek at that old branch again and refresh my memory on where this is.
idset_decode(3)
shows an exponential performance issue when encoding large integers. The following test program demonstrates the issue:The program aborts before we can test up to
IDSET_INVALID_ID - 1
. Thus I guess the idset implementation can't store the full range of unsigned integers (a separate issue).The issue is probably academic at this time, because there isn't a real use case for very large idsets. Even a 10,000 node cluster with 128 cores per node would have a max task rank of 1280000.
I ran across this because I was trying to make an idset that matched "all" tasks by creating a large range. For now I can use some other approach.