billforsternz / thc-chess-library

General Purpose Rules of Chess Library for C++
MIT License
38 stars 13 forks source link

Understanding Hashes and Related Functions #15

Closed bolomcs50 closed 1 year ago

bolomcs50 commented 1 year ago

Hello,

I'm playing around with the Hash functions of the library. I encountered a behavior that I suspect is not the intended one. In short: Hash64Update(hash, move) does not seem to produce the correct Hash. An example: Let's start from the initial position

thc::ChessEvaluation cr;
cr.Forsyth("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1");
thc::Move mv;
mv.TerseIn(&cr, "e2e4");

Before playing any move, the position is correct, and it is possible to Hash64Calculate() a Hash:

uint64_t currentHash = cr.Hash64Calculate();
std::cout << "Starting Position:" << cr.ToDebugStr();
std::cout << "Hash64Calculate(): " << currentHash << std::endl;

which yields, as expected

Starting Position:
White to move
rnbqkbnr
pppppppp
........
........
........
........
PPPPPPPP
RNBQKBNR
HashCalculate(): 1111876492934684975

After playing (or pushing) the move 1. e4, though, calculating the hash again with cr.Hash64Calculate() and updating it with hash = cr.Hash64Update(hash, move), yield different results:

cr.PushMove(mv); // cr.PlayMove(mv);
std::cout << "\nPosition after " << mv.TerseOut() << ": " <<  cr.ToDebugStr();
std::cout << "Hash64Calculate(): " << cr.Hash64Calculate() << std::endl;
currentHash = cr.Hash64Update(currentHash, mv);
std::cout << "Updated Hash: " << currentHash << std::endl;

yields

Position after e2e4: 
Black to move
rnbqkbnr
pppppppp
........
........
....P...
........
PPPP.PPP
RNBQKBNR
Hash64Calculate(): 5618832418565224928
Updated Hash: 6951459208584661089

Shouldn't the two hashes coincide here? Furthermore, if I pop the move, cr.Hash64Calculate() returns the initial hash, as expected, but Hash64Update() does not. In fact:

cr.PopMove(mv);
std::cout << "\nPosition after popping " << mv.TerseOut() << ": " <<cr.ToDebugStr();
std::cout << "Hash64Calculate(): " << cr.Hash64Calculate() << std::endl;
currentHash = cr.Hash64Update(currentHash, mv);
std::cout << "Updated Hash: " << currentHash << std::endl;

yields

Position after popping e2e4: 
White to move
rnbqkbnr
pppppppp
........
........
........
........
PPPPPPPP
RNBQKBNR
Hash64Calculate(): 1111876492934684975
Updated Hash: 2516593853674508462

whereas I would expect that hashing in the same move twice in a row with cr.Hash64Update() would return the initial hash. What am I doing wrong? Am I misunderstanding/misusing the functions?

Thank you!

billforsternz commented 1 year ago

Thank you for this excellent, clearly written problem statement.

It's been a while since I used the hash functions, I originally created them for an old version of Tarrasch when I was trying to use standard database technology to build a Tarrasch database (Basically SQLITE database, all positions in all games stored as hashes - it didn't go well and I abandoned that approach). I don't use these functions any more in any of my projects and am considering removing them; I mean I just generated my own random numbers for these hash functions - maybe the hashes used more widely in chess programming ("Zobrist" hashes I believe they're called) - somehow reduce the random chance of collisions, a real problem with hashes I imagine.

Never-the-less, the hashes should work of course (spoiler alert - I think they do).

I started by reproducing your problem - yes I could do that, thanks again for being so clear.

It didn't take long to see your misunderstanding; The incremental hash algorithm takes a position with a corresponding start hash plus a move and calculates the hash of the new position after the move is played. This is the only way incremental hash calculation can possibly work. Instead you take the position after the move is played, that's not going to work!

Putting it another way, the incremental hash algorithm is a special kind of a make-a-move algorithm - it looks at a position and considers the impact of a move. The position must be the position before the move is made. The comments to the Hash64Update() function itself make this clear.

Hope this helps!

On Sat, 24 Sept 2022 at 23:01, bolomcs50 @.***> wrote:

Hello,

I'm playing around with the Hash functions of the library. I encountered a behavior that I suspect is not the intended one. In short: Hash64Update(hash, move) does not seem to produce the correct Hash. An example: Let's start from the initial position

thc::ChessEvaluation cr; cr.Forsyth("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"); thc::Move mv; mv.TerseIn(&cr, "e2e4");

Before playing any move, the position is correct, and it is possible to Hash64Calculate() a Hash:

uint64_t currentHash = cr.Hash64Calculate(); std::cout << "Starting Position:" << cr.ToDebugStr(); std::cout << "Hash64Calculate(): " << currentHash << std::endl;

which yields, as expected

Starting Position: White to move rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR HashCalculate(): 1111876492934684975

After playing (or pushing) the move 1. e4, though, calculating the hash again with cr.Hash64Calculate() and updating it with hash = cr.Hash64Update(hash, move), yield different results:

cr.PushMove(mv); // cr.PlayMove(mv); std::cout << "\nPosition after " << mv.TerseOut() << ": " << cr.ToDebugStr(); std::cout << "Hash64Calculate(): " << cr.Hash64Calculate() << std::endl; currentHash = cr.Hash64Update(currentHash, mv); std::cout << "Updated Hash: " << currentHash << std::endl;

yields

Position after e2e4: Black to move rnbqkbnr pppppppp ........ ........ ....P... ........ PPPP.PPP RNBQKBNR Hash64Calculate(): 5618832418565224928 Updated Hash: 6951459208584661089

Shouldn't the two hashes coincide here? Furthermore, if I pop the move, cr.Hash64Calculate() returns the initial hash, as expected, but Hash64Update() does not. In fact:

cr.PopMove(mv); std::cout << "\nPosition after popping " << mv.TerseOut() << ": " <<cr.ToDebugStr(); std::cout << "Hash64Calculate(): " << cr.Hash64Calculate() << std::endl; currentHash = cr.Hash64Update(currentHash, mv); std::cout << "Updated Hash: " << currentHash << std::endl;

yields

Position after popping e2e4: White to move rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR Hash64Calculate(): 1111876492934684975 Updated Hash: 2516593853674508462

whereas I would expect that hashing in the same move twice in a row with cr.Hash64Update() would return the initial hash. What am I doing wrong? Am I misunderstanding/misusing the functions?

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/billforsternz/thc-chess-library/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSQYSQUAVMWEOCIKTH7UU3V73NQHANCNFSM6AAAAAAQUTE2PQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bill Forster 15A Boundary Road Kelburn Wellington 6012 New Zealand +64 21 357 371 @.***

bolomcs50 commented 1 year ago

Ah, yes! This makes sense. After a superficial look at the Hash64Update() function I was working under the hypothesis that it only relies on the information contained in the hash and move passed as arguments, but this can't be the case: it also needs to know the state of the board, which in fact it polls by calling IsEmptySquare() in several places.

As a consequence, the hash should be updated before playing/pushing a move, but after popping one, to revert to the previous position, because in both cases you want to calculate the hash update based on the state of the board before the move is actually played on the board (or after it has been "un-played").

In my code, simply moving the first instance of currentHash = cr.Hash64Update(currentHash, mv); before cr.PushMove(mv);yields the expected results.

Thanks a lot for clearing this up!