ddugovic / Stockfish

Retired multi-variant fork of popular UCI chess engine; please use Fairy-Stockfish instead
https://github.com/ianfab/Fairy-Stockfish
GNU General Public License v3.0
132 stars 44 forks source link

fen where stockfish has trouble finding zh mate #148

Closed isaacl closed 7 years ago

isaacl commented 7 years ago
setoption name UCI_Variant value crazyhouse
position fen r2q1rk1/pp2bb1p/2pp2pK/7N/3pPR1P/1B1P2p1/PPPq4/R7[PPBNNN] b - - 49 25

At low depths and move times it evals the position as completely lost with f7b3, at higher depth it finds an #8 with Bxh4.

ddugovic commented 7 years ago

Duplicated (that Stockfish discovers the mate around depth=17 time=30000 to time=40000 on my PC):

info depth 17 seldepth 29 multipv 1 score mate 8 nodes 65299668 nps 1534404 hashfull 999 tbhits 0 time 42557 pv e7h4 b3f7 f8f7 N@f6 h4f6 N@e7 d8e7 h5f6 e7f6 N@e7 f6e7 B@f6 d2f4 f6g5 e7g5

Black mates in 8 starting with a quiet move in a position where its king is exposed and White has 6 pieces in hand.

stockfishdeveloper commented 7 years ago

Wow...it's incredible that Stockfish can see that given the nature of the position!

ddugovic commented 7 years ago

It's funny that #95 improperly implemented could make it more difficult to solve positions like this one involving one or more quiet moves (although technically 1... Bxh4 is a capture).

sf-x commented 7 years ago

But it's not an issue. f7b3 wins also (mate in 10 or 11 - not sure now). Finding a longer but easier win first is normal.

It's funny that #95 improperly implemented could make it more difficult to solve positions like this one involving multiple quiet moves.

Well, you know that Stockfish Matefinder is weaker. Why wouldn't matefinding hurt strength for crazyhouse as well?

ddugovic commented 7 years ago

Good question, and we don't know the answer to that yet. Maybe somehow crazyhouse chess is special and/or some aspects of what jhellis3 developed can be made useful in crazyhouse?

To be honest I was tempted to immediately close #95 but inevitably someone will open an issue just like it.

sf-x commented 7 years ago

Note that the mate is only that long because of futile checks by white. When white runs out of checks and pieces in hand, he's mated in a few moves. If SF has to play that on board, it will find the mating moves. Before that, calculating the mate is a waste of time; confidence that it's there is enough. So I'd say that this is definitely a non-issue.

sf-x commented 7 years ago

Unless you want a special matefinding mode. But that's discussed in #95.

ddugovic commented 7 years ago

Oh, that's an interesting point: this position is not the best possible test position, since even if SF doesn't see the mate it can still manage to win (regardless of the evaluation).

Honestly a special matefinding mode could be interesting (and maybe I should maintain a mate_finder branch).

Vinvin20 commented 7 years ago

Well, you know that Stockfish Matefinder is weaker. Why wouldn't matefinding hurt strength for crazyhouse as well?

Probably because of the nature of the games :

In chess, you've to play positionally, trying to improve your position and get a better position than your opponent then win a pawn (or more), then win the endgame with the small advantage.

Crazyhouse is way more tactical and a forced mate in 20 can appears when both king are under heavy attack (a pawn or a piece more can be irrelevant) and finding the mate variation is the key to win the game.

ddugovic commented 7 years ago

For what it's worth, I just created a mate_finder branch.

sf-x commented 7 years ago

finding the mate variation is the key to win the game.

No. If you can GUESS which variations win or lose, you don't need to calculate. Example: If only check evasions you have just lose material for nothing, the position is in all likelihood lost. If remaining depth is low, it should just be considered lost without further consideration.

Vinvin20 commented 7 years ago

Sometimes yes, sometimes no.

There's no doubt that SF-Matefinder is stronger than Regular-SF in tactical positions Based on my test : http://www.talkchess.com/forum/viewtopic.php?p=673942#673942

On 2 runs, SF-MateFinder : found at least one time : 96 solutions ! On 3 runs, Stockfish_160520_x64_modern_fast scored only 79 !

And crazyhouse is more tactical than chess !

ianfab commented 7 years ago

@Vinvin20 You should not confuse finding the best move with playing strength. If you want to measure playing strength in tactical positions, you should play out these positions with the two version. The versions that has the better score can than be considered to be better in tactical positions. An engine does not necessarily have to find the best move as long as it finds a winning move.

I understand that if you use an engine for analysis you want it to find such checkmate combinations, but this is not essential for a strong engine. However, it would of course be nice.

Vinvin20 commented 7 years ago

It's a tactical set, so, no doubt the one who finds the solution will win the game.

ianfab commented 7 years ago

Sorry, I haven't looked at the positions. If the positions are either completely won or lost depending on one move, you are right. However, I would be more interested in seeing results for positions where you have a winning combination, but if you do not see it, the position still is about equal, because this is what usually is the case in real games and hence would, in my opinion, be a better indicator for playing strength.

ddugovic commented 7 years ago

It's a tactical set, so, no doubt the one who finds the solution will win the game.

That implies there is only one solution. OTB it is common for a player to pass up a mate in N yet play quite strongly.

Vinvin20 commented 7 years ago

In crazyhouse there's about no "equal positions". Eval is often chaotic and you've to take your chance when you've an attack ...

ianfab commented 7 years ago

Game theoretically it might well be that there are relatively few positions in crazyhouse that are a draw. However, we are so far away from perfect play in crazyhouse, that, in my opinion, it is pointless to talk about game theoretical values. Of course, there is no doubt that there is a lot of room for improvements on pruning, extensions, etc. in crazyhouse, but I am not entirely convinced that the methods of the matefinder are the right path for crazyhouse unless there are convincing test results.

ddugovic commented 7 years ago

I too am deeply skeptical that the matefinder ideas will improve performance in anything other than extremely difficult puzzles involving quiet moves. My test results with mate_finder show that it is slower.

Vinvin20 commented 7 years ago

I hope you are right ;-)

isaacl commented 7 years ago

The reasons I posted this:

ianfab commented 7 years ago

@isaacl Yes, pieces in hand have slightly different piece values. However, your comment gave me the idea of evaluating pieces in hand based on the board position, e.g. adding a bonus for pawns in hand if there are empty squares on the 7th rank, if pieces in hand can be dropped with check, etc.

sf-x commented 7 years ago

@ianfab Please to post your new code or description - I don't want to test conflicting changes.

However, your comment gave me the idea of evaluating pieces in hand based on the board position, e.g. adding a bonus for pawns in hand if there are empty squares on the 7th rank, if pieces in hand can be dropped with check, etc.

Well, king safety evaluation in mainline Stockfish counts safe checks possible; but it ignores drops. Perharps accounting for drops but not otherwise changing the logic would be a good idea?

ianfab commented 7 years ago

@sf-x I currently do not write or test any new code, because I am busy with creating a version of Stockfish that generates EPD opening books for testing.

I had already tested this idea a few days ago, but it failed (I can search for the results later if you are interested), probably because it overvalues pieces in hand, since they usually have many possible checks. However, when I wrote it, I did not really think of extending this idea to other parts of the evaluation. It might be worth thinking about adding a function similar to evaluate_pieces for pieces in hand.

sf-x commented 7 years ago

I can search for the results later if you are interested

Yes please.

probably because it overvalues pieces in hand, since they usually have many possible checks.

Another possible reason is time control being too short to calculate out the attacks.

ianfab commented 7 years ago

I have found the results:

LLR: -2.99 (-2.94,2.94) [0.00,20.00]
Total: 656 W: 292 L: 322 D: 42
ddugovic commented 7 years ago

Well, king safety evaluation in mainline Stockfish counts safe checks possible; but it ignores drops. Perhaps accounting for drops but not otherwise changing the logic would be a good idea?

Maybe penalize king safety based on drops on safe squares; for example in this position at the end of a forcing variation, Black safely plays N@f3+ and White is hopelessly doomed.

Safe could mean any of:

ddugovic commented 7 years ago

Adding this mate in 15 puzzle here since #95 is resolved (I created a mate_finder branch):

setoption name UCI_Variant value crazyhouse
setoption name Threads value 4
position fen r5k1/pppqbrp1/2n3Bp/3p1n1p/4p3/1PN1P2B/P1PP2PP/R1B2RK1[NPq] b - - 33 17
go infinite
jhellis3 commented 7 years ago

Hi, just wanted to comment that at least one aspect of MF probably hurts much worse than normal in zh, and that is the TT changes. You only get 2/3rds the TT entries when using the full key, and due to zh having a naturally higher branching factor, there is probably much more hash pressure for any given depth. If you have more hash than you can use in the average move time, the full keys can actually gain 1-2 Elo, but as soon as you get hash pressure, performance and Elo drops considerably. Not pruning in low material situations (in futility and null pruning) is also probably pretty useless in zh, since it is extremely unlikely for a pawn race to decide a zh game.

ddugovic commented 7 years ago

Thanks @jhellis3 ! When reading the code I was thinking the same thing but merging the changes minus the TT changes seemed messy. I figured I would merge all the changes to ddugovic:Stockfish/mate_finder and regression test before reverting anything or making more branches.

Vinvin20 commented 7 years ago

3 more games I found while browsing longest games from my recent match 1core vs 6cores. That confirms my thinking that SF should have a better mate finder algorithm :

At moves 77 : https://fr.lichess.org/p36l0TmZ#154 Around move 65 : https://fr.lichess.org/rWTzvckw#129 Around move 66 : https://fr.lichess.org/v1SjNvzV#132

ddugovic commented 7 years ago

Having the test positions is helpful. Let's not jump to conclusions about whether the algorithm, parameters, or something else is the issue.

Vinvin20 commented 7 years ago

A condition like this one may help SF-zh in mating net : "No LMR if in danger of getting mated (like pruning)" https://github.com/locutus2/Stockfish/compare/a47bbca...ba6bf2b

ddugovic commented 7 years ago

Maybe. Someone would need to test it following directions in #149.

ianfab commented 7 years ago

The test is already running. So far the results are not very promising, but I will wait for the test to finish and then post the final results.

ianfab commented 7 years ago

@Vinvin20 Here are the results for the patch you mentioned:

LLR: -3.06 (-2.94,2.94) [0.00,20.00]
Total: 1024 W: 483 L: 505 D: 36
Vinvin20 commented 7 years ago

Thanks for the test ! But disappointed by the results ...

ddugovic commented 7 years ago

Severe changes (such as "always do X" or "never do X") tend to have severe effects. The result is unfortunate but not surprising.

Vinvin20 commented 7 years ago

It's a severe change but for a small part of the tree (where there are mates).

jhellis3 commented 7 years ago

I have found you don't really need to worry about LMR when it comes to seeing actual mates. Where LMR changes can make a difference is on the path towards the mate if that path involves sacrificing material. The most significant mate detection change in MateFinder is the null move criteria alteration (the legal, and available king moves check). I would suggest testing that alone; if any part of MateFinder may be beneficial to zh, I would expect that to be it.

ddugovic commented 7 years ago

Thanks, I will test that alone now that I'm aware of what to isolate and test.

ddugovic commented 7 years ago

Hm... I'm looking at http://github.com/jhellis3/Stockfish/blame/9914d9ccf7cb57b63c81d897334dae6a8178c6c5/src/search.cpp and a bit confused where the LMR change for "legal & available king moves check" is done. Are we talking about Step 8?

// Step 8. Null move search with verification search (is omitted in PV nodes)
jhellis3 commented 7 years ago

There is no LMR change I would recommend. The most valuable change is the condition in Step 8, for which you will also need the changes to movegen.h.

ddugovic commented 7 years ago

Sadly Step 8 alone doesn't perform: make profile-build hangs. On current master I tried simplifying this to see if I can get make profile-build to complete (although make build works, search produces no output):

EDIT: Adding the missing parentheses allows make profile-build to complete, but search in crazyhouse produces no output.

    // Step 8. Null move search with verification search (is omitted in PV nodes)
...
    if (   !PvNode
        &&  eval >= beta
        && (ss->staticEval >= beta - 35 * (depth / ONE_PLY - 6) || depth >= 13 * ONE_PLY)
#ifdef CRAZYHOUSE
        && (pos.is_house() ? (abs(eval) < 2 * VALUE_KNOWN_WIN
            && !(depth > 4 * ONE_PLY && (inCheck ? MoveList<LEGAL>(pos).size() < 6 : MoveList<LEGAL, KING>(pos).size() < 1))) :
            pos.non_pawn_material(pos.side_to_move())))
#else
        &&  pos.non_pawn_material(pos.side_to_move()))
#endif
    {
ianfab commented 7 years ago

@ddugovic It probably fails because piece_on(from_sq(move)) does not work for piece drops. I think it should work if you replace it by moved_piece(move).

jhellis3 commented 7 years ago

Why is inCheck ? there, that is not in my code? I can't really comment on something which is not my code and where I can not see a diff.

ddugovic commented 7 years ago

Thanks, that's much better. I am now testing with:

EDIT: Fixed typo below. Damn, writing code is difficult.

    // Step 8. Null move search with verification search (is omitted in PV nodes)
...
    if (   !PvNode
        &&  eval >= beta
        && (ss->staticEval >= beta - 35 * (depth / ONE_PLY - 6) || depth >= 13 * ONE_PLY)
#ifdef CRAZYHOUSE
        && (pos.is_house() ? (eval < 2 * VALUE_KNOWN_WIN
            && !(depth > 4 * ONE_PLY && (MoveList<LEGAL, KING>(pos).size() < 1 || MoveList<LEGAL>(pos).size() < 6))) :
            pos.non_pawn_material(pos.side_to_move())))
#else
        &&  pos.non_pawn_material(pos.side_to_move()))
#endif
    {
ddugovic commented 7 years ago

In desperation I added inCheck hoping that less code would be executed. I have reverted that change, although honestly in crazyhouse if MoveList<LEGAL>(pos).size() < 6 is true and inCheck is false, then very likely pos.non_pawn_material(pos.side_to_move()) is zero and the player's position is a disaster.

I removed the pos.non_pawn_material(pos.side_to_move()) check because in positions of interest, both players have at least 1 piece.

ddugovic commented 7 years ago

Unfortunately this change does not scale. Next I shall try replacing MoveList<LEGAL, KING>(pos).size() < 1 || MoveList<LEGAL>(pos).size() < 6 with MoveList<LEGAL, KING>(pos).size() < 1 || (inCheck && MoveList<LEGAL>(pos).size() < 6) since some positions have over 100 legal moves and legal move generation is expensive:

==> test20161221-1.out <==
Score of PATCH vs Stockfish 211216 64 BMI2: 248 - 233 - 19  [0.515] 500
Elo difference: 10.43 +/- 29.91

==> test20161221-10.out <==
Score of PATCH vs Stockfish 211216 64 BMI2: 249 - 227 - 24  [0.522] 500
Elo difference: 15.30 +/- 29.77

==> test20161221-30.out <==
Score of PATCH vs Stockfish 211216 64 BMI2: 224 - 249 - 27  [0.475] 500
Elo difference: -17.39 +/- 29.68

==> test20161221-60.out <==
Score of PATCH vs Stockfish 211216 64 BMI2: 87 - 109 - 4  [0.445] 200
Elo difference: -38.37 +/- 48.20
jhellis3 commented 7 years ago

AFAIK, inCheck should always be false (due to step 5), so you are effectively removing the criteria, which can be reduced down to just MoveList<LEGAL, KING>(pos).size() < 1.