Closed joaquintides closed 2 years ago
I assume you mean max_load_factor()
, because load_factor()
returns the current load factor (size / capacity). If so, I prefer option 2, make it return the real max load factor, beyond which reallocation happens.
Since this is a new container, we might consider actually exposing max_load(), which is a more useful query, and typically what users need (how many elements can I insert without reallocation?).
I assume you mean
max_load_factor()
, becauseload_factor()
returns the current load factor (size / capacity). If so, I prefer option 2, make it return the real max load factor, beyond which reallocation happens.
I really meant load_factor()
, rationale being that slots not recovered count as "load". As for changing the definition of max_load_factor()
, I don't know if users expect this number to change given its static nature in closed addressing.
Anyway, I like your max_load()
idea very much: it retains the standard meaning of load_factor()
and max_load_factor()
and gives a simple way for users to plan ahead of rehashing.
I really meant
load_factor()
, rationale being that slots not recovered count as "load".
Oh I see. Yes, this makes sense. I still prefer option 2, then, from the two given.
Oh I see. Yes, this makes sense. I still prefer option 2, then, from the two given.
But we both prefer option 3, do we? (I do)
load_factor()
or max_load_factor()
, and provide max_load()
Yes, I prefer option 3 too.
Merging #153 (a81d785) into develop (772e1e7) will increase coverage by
0.05%
. The diff coverage is85.71%
.
Consider this scenario of repetated insert/erase runs:
After each insertion run, the average number of hops on unsuccessful lookup is:
that is, probe lengths grow unboundedly (numbers obtained through instrumentation). This drift phenomenon is due to the fact that overflow bytes can't be reset as elements are erased, so they keep saturating on new insert runs --all non-relocating, open-addressing containers, like Abseil for example, experience this problem. The anti-drift mechanism proposed consists in reducing the maximum allowed load when erasing an element that potentially caused an overflow, so that, on the subsequent insert run, rehashing is triggered before hitting the load of the previous run --on average, when the load reaches (previous load − number of erased elements × 0.18). I've re-run the experiment above with anti-drift and the average number of hops stays at 0.23 between runs, as it should. In this context, rehashing does not grow the bucket array (the same capacity as before is allocated), though we introduce a hysteresis factor to actually grow the array if insert/erase runs oscillate minimally in the vicinity of full load.
Performancewise, anti-drift only affects erasure. These benchmarks show that the overhead ranges from negligible to ~10% in non-SIMD (see "Running erasure").
As for the container's interface, the obvious impact is that rehashing can happen before hitting the theoretical maximum load. We can do two things here (open question to you):
load_factor()
as is and let users know the returned valor is a hint and rehashing can happen before expected if there were previous erasures.load_factor()
to return(size() + amount detracted to maximum load)/bucket_count()
.