Open rescrv opened 10 years ago
Ditto for putIfMatch.
There are a couple other race conditions as well. If this lib is actively used, I'm happy to report them, but I'd like to avoid typing them up if the effort would be wasted.
Yes please do, it's in active use in a number of different places.
Here's the other major "gotcha" cases I found. For reference, my C++ implementation is here and is what we're using in HyperDex now.
The resize method makes a chain of inner tables. Although it's extremely unlikely, it's possible for the recursive putIfMatch
call to overrun the stack. I saw this in an application with more threads than cores, where one thread was forced to wait to run. By the time it ran, the other threads had constructed many new tables that the global table had promoted past. These intermediary tables were necessarily filled with tombstones, but the straggler thread would still attempt to resize them using the copy helper. Of course, this copy helper would step down to the next table, and repeat. Eventually it overran the stack. Tuning the table resize rate can significantly decrease the likelihood of this race condition. A more solid fix, that I use in my impl, is to count the resize number at which each inner table was established. Upon entry to the putIfMatch
call, I skip ahead to top-most table accessible from the outer hash map. This allows a straggler to always work on a copy of the inner table where it can do useful work, without scanning tables that are definitely fully copied.
I also thought the counter implementation was racy during a resize, but it looks like it's doing the right thing.
The other issue I forgot about and didn't include was the "clear" call. It doesn't behave well with resizes, especially stacked resizes. I opted to remove it completely.
I believe there is a typo in
get_impl
here: https://github.com/boundary/high-scale-lib/blob/master/src/main/java/org/cliffc/high_scale_lib/NonBlockingHashMap.java#L540The line should instead read
K == TOMBSTONE
.You'll note that
key
is what the user passed in, and users should never try to retrieve aTOMBSTONE
. In fact, I think Java's type safety prevents them from even getting a reference to theTOMBSTONE
.This typo can effect the safety and efficiency of the
get
operation as the hash table is no longer linearizable. A write, that is then marked with aTOMBSTONE
and copied to the new table will be set toTOMBSTONE
. If the copying and the get race, the copy could see anull
and return thenull
, even though it should instead begin looking in the next table. It's a small race, but it's there.It's also less efficient to reprobe up to
reprobe_limit
on larger tables, but what's a few extra cycles among friends ;-).