The current logic to compute position keys guarantees uniqueness but board positions that are a single move away have keys with poor locality. The transposition table access is thus all over the place and leads to large cache misses.
I ran a benchmark on the source code with valgrind and the results are in agreement with the observation.
On test set L2_R2, ~37.5 million calls to TranspositionTable::get function with 99% read miss in data cache!!
Note: I did a build in RelWithDebInfo configuration using CMake and the get and put functions were inlined in my case. Had to force compiler to not inline during benchmarking using __attribute__((noinline))
Posting this here if anyone can come up with a more cache friendly transposition table implementation.
The current logic to compute position keys guarantees uniqueness but board positions that are a single move away have keys with poor locality. The transposition table access is thus all over the place and leads to large cache misses.
I ran a benchmark on the source code with valgrind and the results are in agreement with the observation. On test set L2_R2, ~37.5 million calls to
TranspositionTable::get
function with 99% read miss in data cache!!Note: I did a build in
RelWithDebInfo
configuration using CMake and theget
andput
functions were inlined in my case. Had to force compiler to not inline during benchmarking using__attribute__((noinline))
Posting this here if anyone can come up with a more cache friendly transposition table implementation.