Open voutcn opened 9 years ago
Hi voutcn,
We use powers of 2 to size the table since it's a fairly standard way to grow vectors and other resizable arrays, and because a power-of-two sized table has some convenient properties for computing hashes in our model.
I'm not really sure resizing by a smaller factor would fix your problem, because say we sized by 1.5x. Then if the number of items happened to be slightly larger than a power of 1.5, the load factor would still only be ~50%. Do you have a specific use case where you need to have a large table with slightly more elements than a power of 2? Also remember that it is the bucket size that is a power of 2, not the number of elements. By default, each bucket has 8 items, so we can actually store 2^n * 8 values for each power-of-two sizing of the table.
I've actually been working on an implementation of fractional resizing for a couple of days. It's not done yet, & I will likely not have it done until next week, but I will have something reasonably soon to use to evaluate it.
The trade off is that it will be slower for insertion and access, but it will have lower memory access. Maybe a 15 or 10 percent slowdown.
On Wed, Mar 18, 2015, 12:04 Manu Goyal notifications@github.com wrote:
Hi voutcn,
We use powers of 2 to size the table since it's a fairly standard way to grow vectors and other resizable arrays, and because a power-of-two sized table has some convenient properties for computing hashes in our model.
I'm not really sure resizing by a smaller factor would fix your problem, because say we sized by 1.5x. Then if the number of items happened to be slightly larger than a power of 1.5, the load factor would still only be ~50%. Do you have a specific use case where you need to have a large table with slightly more elements than a power of 2? Also remember that it is the bucket size that is a power of 2, not the number of elements. By default, each bucket has 8 items, so we can actually store 2^n * 8 values for each power-of-two sizing of the table.
Reply to this email directly or view it on GitHub https://github.com/efficient/libcuckoo/issues/7#issuecomment-83035723.
That's interesting. Is the slowdown related to the hashing and bucket indexing?
-Manu On Mar 18, 2015 9:07 AM, "David Andersen" notifications@github.com wrote:
I've actually been working on an implementation of fractional resizing for a couple of days. It's not done yet, & I will likely not have it done until next week, but I will have something reasonably soon to use to evaluate it.
The trade off is that it will be slower for insertion and access, but it will have lower memory access. Maybe a 15 or 10 percent slowdown.
On Wed, Mar 18, 2015, 12:04 Manu Goyal notifications@github.com wrote:
Hi voutcn,
We use powers of 2 to size the table since it's a fairly standard way to grow vectors and other resizable arrays, and because a power-of-two sized table has some convenient properties for computing hashes in our model.
I'm not really sure resizing by a smaller factor would fix your problem, because say we sized by 1.5x. Then if the number of items happened to be slightly larger than a power of 1.5, the load factor would still only be ~50%. Do you have a specific use case where you need to have a large table with slightly more elements than a power of 2? Also remember that it is the bucket size that is a power of 2, not the number of elements. By default, each bucket has 8 items, so we can actually store 2^n * 8 values for each power-of-two sizing of the table.
Reply to this email directly or view it on GitHub https://github.com/efficient/libcuckoo/issues/7#issuecomment-83035723.
— Reply to this email directly or view it on GitHub https://github.com/efficient/libcuckoo/issues/7#issuecomment-83036657.
Hi, Dinghua,
Growing the table size by a smaller factor (rather than 2x everytime ) is certainly an interesting and attractive feature. Actually I was told by different people that this could be useful in certain practical applications. For libcuckoo, we made this choice (growing table size by 2x) to enjoy some tricks that can speed up the table grow by doing memcpy, and cleaning up the table in background (compared to locking the table and rehashing each item). So this is a tradeoff. But if you can finish it with some clever trick, I am very happy to learn :)
Best,
On Wed, Mar 18, 2015 at 9:04 AM, Manu Goyal notifications@github.com wrote:
Hi voutcn,
We use powers of 2 to size the table since it's a fairly standard way to grow vectors and other resizable arrays, and because a power-of-two sized table has some convenient properties for computing hashes in our model.
I'm not really sure resizing by a smaller factor would fix your problem, because say we sized by 1.5x. Then if the number of items happened to be slightly larger than a power of 1.5, the load factor would still only be ~50%. Do you have a specific use case where you need to have a large table with slightly more elements than a power of 2? Also remember that it is the bucket size that is a power of 2, not the number of elements. By default, each bucket has 8 items, so we can actually store 2^n * 8 values for each power-of-two sizing of the table.
— Reply to this email directly or view it on GitHub https://github.com/efficient/libcuckoo/issues/7#issuecomment-83035723.
Computer Science Department Carnegie Mellon University
Yeah - we have to move to mod instead of mask to compute the indexes.
I use a fast computation of the mod for a fixed #, but it's still slower than mask.
On Wed, Mar 18, 2015 at 2:41 PM Bin Fan notifications@github.com wrote:
Hi, Dinghua,
Growing the table size by a smaller factor (rather than 2x everytime ) is certainly an interesting and attractive feature. Actually I was told by different people that this could be useful in certain practical applications. For libcuckoo, we made this choice (growing table size by 2x) to enjoy some tricks that can speed up the table grow by doing memcpy, and cleaning up the table in background (compared to locking the table and rehashing each item). So this is a tradeoff. But if you can finish it with some clever trick, I am very happy to learn :)
Best,
- Bin
On Wed, Mar 18, 2015 at 9:04 AM, Manu Goyal notifications@github.com wrote:
Hi voutcn,
We use powers of 2 to size the table since it's a fairly standard way to grow vectors and other resizable arrays, and because a power-of-two sized table has some convenient properties for computing hashes in our model.
I'm not really sure resizing by a smaller factor would fix your problem, because say we sized by 1.5x. Then if the number of items happened to be slightly larger than a power of 1.5, the load factor would still only be ~50%. Do you have a specific use case where you need to have a large table with slightly more elements than a power of 2? Also remember that it is the bucket size that is a power of 2, not the number of elements. By default, each bucket has 8 items, so we can actually store 2^n * 8 values for each power-of-two sizing of the table.
Reply to this email directly or view it on GitHub https://github.com/efficient/libcuckoo/issues/7#issuecomment-83035723.
Computer Science Department Carnegie Mellon University
Reply to this email directly or view it on GitHub https://github.com/efficient/libcuckoo/issues/7#issuecomment-83116483.
Thank you all.
@manugoyal Theoretically resizing by say 1.5x should increase the lower bound the load factor. If the number of elements happen to be slightly larger than power of 1.5, the load factor will ~1/1.5 = 2/3. @apc999 understand that power of 2 has many good properties, but I didn't go that deep to study those tricks :) @dave-andersen look forward to your work!
Following up on this - sorry for the very long delay. I now know how to do this in a quite reasonably efficient way, but it's going to be a somewhat complex implementation path to get this version of it to handle that cleanly. Will start a design doc...
Following up on this - sorry for the very long delay. I now know how to do this in a quite reasonably efficient way, but it's going to be a somewhat complex implementation path to get this version of it to handle that cleanly. Will start a design doc...
Hi David, may I ask is there any update?
Hi all,
If I am correct, the rehashing alway double the bucket size. In practice, if the number of items is slightly larger than power of two, a load factor will be only ~50%, which is a waste of memory. I would like to ask if it is possible that the rehashing can boost the bucket size by a smaller factor, say 1.5 or 1.3, which is more memory-friendly. Thanks.