efficient / libcuckoo

A high-performance, concurrent hash table
Other
1.62k stars 275 forks source link

.clear() taking too long #162

Open abhimanyu891998 opened 5 months ago

abhimanyu891998 commented 5 months ago

Hey @manugoyal , when calling .clear() on cuckoohash_map, it takes about 1-2 ms. Is this expected? And is there a faster way to reset the hash map?

manugoyal commented 5 months ago

Hi @abhimanyu891998. Off the top of my head, I would guess clear takes a while because it must take all the table locks (https://github.com/efficient/libcuckoo/blob/88bb6e2ca1b68572f66e3e2bea071829652afeb2/libcuckoo/cuckoohash_map.hh#L696). This is necessary because clearing cannot race with other concurrent operations on the table which may be reading/writing data.

It would be useful to benchmark what part of clear is taking a long time for you. If you are running clear at a point where no concurrent operations are running on the table, I would think that acquiring all the uncontended locks should be pretty fast, and clear shouldn't take much time. In which case, there might be other reasons.

If you are trying to clear while concurrent operations are occurring, the fastest approach might be to replace the table with a freshly-allocated one. Which could be safely managed with shared pointers or something. For example:

auto my_table = std::make_shared<cuckoohash_map<int, int>>(...);
// Do stuff
...
// Quickly "clear" the table by replacing it with a fresh one. Existing concurrent operations will continue
// to operate on the old table.
my_table = std::make_shared<cuckoohash_map<int, int>>(...);