Closed miroslavp closed 1 year ago
lets say we have a hashmap which is reasonable full and all probes (jumpdistances) failed to find an empty entry. Without the maxjumpdistance we would actually endlessly loop :). We cant use while true because what will happen if we reach the end of the jumpDistance array.
We dont have to loop the entire array we can just backout early. Instead of 31 (num_jump_distances) we can backout at 1 or 2, these jumpdistances dont have to have ( empty entries )
There is always a tradeoff, for now i think we should go this way. the performance increase is neglectable
thanks for taking time to explain
I noticed that the code started to run a bit slower after the adding of the _maxDistance optimization. I have run two tests with the old
while(true)
and the newwhile (jumpDistanceIndex <= _maxDistance)
loop. The tests differ from each other only in the number of the existent and the non-existent keysAs you can see, in both cases the old
while(true)
code works better. I've started to wonder why ... After some examination of the code, I realized that this check is not necessary at all and I think it is partially my fault for this. I asked you in this issue whether the performance would be worse if the number of the non-existent keys is larger and kind of pushed you in the wrong direction.It turns out that the code was already taking good care of the non-existent keys and the adding of additional code complexity was not necessary. Here in this code snippet https://github.com/Wsm2110/Faster.Map/blob/cbd10bd253185fa7e605c892f9c6bf0538927a6b/src/DenseMapSIMD.cs#L374-L380 we check for empty bucket and if we find one, we break. So basically we don't really iterate every entry (go through all the jump distances).
Am I correct or am I missing something obvious here?