Wsm2110 / Faster.Map

Faster.Map is a high-performance, thread-safe key-value store designed to outperform the standard Dictionary and ConcurrentDictionary
MIT License
75 stars 7 forks source link

Performance regression after introduction of _maxDistance #20

Closed miroslavp closed 1 year ago

miroslavp commented 1 year ago

I noticed that the code started to run a bit slower after the adding of the _maxDistance optimization. I have run two tests with the old while(true) and the new while (jumpDistanceIndex <= _maxDistance) loop. The tests differ from each other only in the number of the existent and the non-existent keys

900_000 : 100_000 (existent to non-existent keys) ratio Method Mean Error StdDev Ratio RatioSD
DenseMapSIMD2_Get_WithMaxDistanceCheck 10.65 ms 0.030 ms 0.060 ms 1.00 0.00
DenseMapSIMD2_Get_WithoutMaxDistanceCheck 10.04 ms 0.167 ms 0.336 ms 0.94 0.03
500_000 : 500_000 (existent to non-existent keys) ratio Method Mean Error StdDev Ratio RatioSD
DenseMapSIMD2_Get_WithMaxDistanceCheck 6.707 ms 0.0984 ms 0.1987 ms 1.00 0.00
DenseMapSIMD2_Get_WithoutMaxDistanceCheck 6.268 ms 0.0449 ms 0.0906 ms 0.94 0.02

As you can see, in both cases the old while(true) code works better. I've started to wonder why ... After some examination of the code, I realized that this check is not necessary at all and I think it is partially my fault for this. I asked you in this issue whether the performance would be worse if the number of the non-existent keys is larger and kind of pushed you in the wrong direction.

It turns out that the code was already taking good care of the non-existent keys and the adding of additional code complexity was not necessary. Here in this code snippet https://github.com/Wsm2110/Faster.Map/blob/cbd10bd253185fa7e605c892f9c6bf0538927a6b/src/DenseMapSIMD.cs#L374-L380 we check for empty bucket and if we find one, we break. So basically we don't really iterate every entry (go through all the jump distances).

Am I correct or am I missing something obvious here?

Wsm2110 commented 1 year ago

lets say we have a hashmap which is reasonable full and all probes (jumpdistances) failed to find an empty entry. Without the maxjumpdistance we would actually endlessly loop :). We cant use while true because what will happen if we reach the end of the jumpDistance array.

We dont have to loop the entire array we can just backout early. Instead of 31 (num_jump_distances) we can backout at 1 or 2, these jumpdistances dont have to have ( empty entries )

There is always a tradeoff, for now i think we should go this way. the performance increase is neglectable

miroslavp commented 1 year ago

thanks for taking time to explain