ddugovic / Stockfish

Retired multi-variant fork of popular UCI chess engine; please use Fairy-Stockfish instead
https://github.com/ianfab/Fairy-Stockfish
GNU General Public License v3.0
132 stars 44 forks source link

Crazyhouse: missed mate in 4 #354

Closed Vinvin20 closed 6 years ago

Vinvin20 commented 7 years ago

opperwezen won one game after many losts. SF-Lvl8 and the server analyze overlooked this very nice 15...Rg1!! (found by SF after about 17 millions nodes) https://fr.lichess.org/1JQ3DgMe#29

-> this game is very interesting to tune search and eval : https://fr.lichess.org/TFT5J2FS#66 A lot of up and down after analyze. Especially this very strange 45...gxh5??? https://fr.lichess.org/TFT5J2FS#89

Note that opperwezen plays a lot of 25+2 and 10+2 these days : https://fr.lichess.org/@/opperwezen/search?perf=18&hasAi=1&aiLevelMin=8&sort.field=d&sort.order=desc

ddugovic commented 7 years ago

Bear in mind even at AI level 8, Lichess caps the maximum search depth and thinking time. So unless the AI selects an easily refuted bad move (as it did with 45...gxh5??) there isn't a bug.

Vinvin20 commented 7 years ago

I'm interesting in computer chess for more than 25 years, I never forget this kind of things.

ddugovic commented 7 years ago

Author of http://chessvariants.training/ supplied me with a collection of 4,648 crazyhouse checkmate puzzles. I am using these to help identify whether there is a bug.

Vinvin20 commented 7 years ago

Very nice !

niklasf commented 7 years ago

Nice indeed, but note that afaik most puzzles were created using this Stockfish in the first place.

ddugovic commented 7 years ago

True. I found that (at the rate of 1 second/puzzle) the error rate is halved by a simplifying material-based singular extensions since a newer upstream change works well; however that simplification loses Elo overall.

So unless I'm missing something the problem of "Stockfish occasionally misses mate in 4" is not a solvable problem. Stockfish is a strong engine, it's just bad at analyzing crazyhouse checkmates.

ianfab commented 7 years ago

In my experience, writing a patch that improves Stockfish specifically for certain positions and gains Elo overall is very difficult, but simply improving Elo in general will over time also improve play in specific positions. E.g., I did not design patches for SF to choose better antichess openings, but nevertheless over time it has started to play 1. e3, to avoid (1. e3 b5 2. Bxb5) Bb7, etc. I have observed a similar trend for crazyhouse mating combinations in the past. IMO, if SF can not find some mating combinations that is not really an issue, since you will always be able to find such positions (even for standard chess, although on a different level), but if an earlier version is way better at the same task, it should be checked what caused this and whether it can be fixed.

ddugovic commented 7 years ago

Thanks to #360 being resolved, Stockfish now scores 4392.33 / 4648 on the chessvariants.training test suite (at 4 threads, 1 second per puzzle) consisting of mate-in-[2-6] puzzles. Previously it scored 4357.17 / 4648.

When there's a lull in the queue, I'll redo the test suite with today's crazyhouse improvements and attempt to identify "easy puzzles" that it misses & identify another pattern.

ianfab commented 7 years ago

Apropos, it would be a nice feature to support test suites on fishtest. I'll note that down, although I do not think that I will have time to implement that any time soon.

Vinvin20 commented 7 years ago

Thanks to #360 being resolved, Stockfish now scores 4392.33 / 4648 on the chessvariants.training test suite (at 4 threads, 1 second per puzzle)

Do you mean 4 threads for 1 engine ? 4 (or 2) threads configuration finds solutions more easily sometimes. It's interesting to test in single thread too.

ddugovic commented 7 years ago

Do you mean 4 threads for 1 engine ? 4 (or 2) threads configuration finds solutions more easily sometimes. It's interesting to test in single thread too.

I'm going to assume you mean the following:

versus:

and while I agree the former may be more accurate, it's more work to set up because I want to be able to easily compare the output results (in the same order).

Vinvin20 commented 7 years ago

Yes, I meant "the best is to set only 1 thread inside each engine" (as you did if I understand well).

ddugovic commented 7 years ago

Stockfish is rapidly improving at solving for mate. The most recent puzzle it struggles with is:

setoption name Threads value 1
setoption name MultiPV value 1
setoption name UCI_Variant value crazyhouse
position fen r2q1rk1/pp3ppp/2p1p3/3p4/3p4/3BPP1P/PPPKN1n1/R6R[BNPqbbn] b - - 0 1
d
go movetime 10000
ddugovic commented 7 years ago

After some thought, I have created two short tuning sessions (perhaps these should be tuned together, I don't know, that really depends upon the "shape" of the Elo gain curve in the N-dimensional parameter space): http://35.161.250.236:6543/tests/view/596c01bc6e23db67e90ddaf6 http://35.161.250.236:6543/tests/view/596c02246e23db67e90ddaf8

Vinvin20 commented 7 years ago

The values didn't changed a lot after tuning, are they ?

ianfab commented 7 years ago

@Vinvin20 He stopped the two tuning sessions early and combined them in http://35.161.250.236:6543/tests/view/596c9ffd6e23db67e90ddb01. If the change is still small, decreasing the tuning parameter A or increasing the number of games might help.

ddugovic commented 7 years ago

Indeed, initially I assumed that tuning them together would be much too large a change, but so far that isn't the case...

I think I'll double the number of games in this same session (without doubling A mid-session) which might produce a more precise result. I think the SPSA documentation cautions this algorithm only produces approximations, but increasing the game count (or decreasing A) could in principle yield a more precise approximation.

ianfab commented 7 years ago

@ddugovic Increasing the number of games does not work for a started tuning session, since the number of iterations is not updated if I remember correctly, so it is necessary to resubmit it with a larger number of games. I'll have a look at the code to see whether this can be fixed easily to avoid wasting resources.

ddugovic commented 7 years ago

@ianfab Oops. Well, worst case I can copy the test results, manually translate them into the input format, and submit a new test with A reduced (though I'm unsure to what).

ianfab commented 7 years ago

@ddugovic Right. After a quick look at the code I am not sure what will happen after reaching the 10000 games limit, so let's see.

ddugovic commented 7 years ago

Also, in atomic chess allowing a ShelterWeakness value to be negative proved useful. I'm trying that (using SPSA on my PC) right now...

[Main]
Simulate = 0
Variables = crazyhouse.var
Log = crazyhouse.log
GameLog = crazyhouse_$THREAD.log
Iterations = 10000
A = 1000
Gamma = 0.101
Alpha = 0.602

[Engine]
Engine1 = ./stockfish
Engine2 = ./stockfish
EPDBook = ./books/crazyhouse.epd
BaseTime = 1000
IncTime = 50
Concurrency = 4
DrawScoreLimit = 4
DrawMoveLimit = 8
WinScoreLimit = 650
WinMoveLimit = 8
Variant = crazyhouse
ddugovic commented 7 years ago

I've set aside my tweaks as they aren't making a significant impact.

ddugovic commented 7 years ago

I have created a suite of 100 challenging puzzles for Stockfish to solve!

Vinvin20 commented 7 years ago

1) Don't you have the shortest distance to mate ? 2) How does the current SF on this set ?

ddugovic commented 7 years ago
  1. I'm unsure that my generated solutions are even correct. Assume DTM=6.
  2. About 30% at last test (single-threaded, 1 second/puzzle), but I haven't tested with latest master.
Vinvin20 commented 7 years ago

I checked the first 10 positions. I copy/paste the analyzed there.

I found some more solutions and improvements : Position 3 : Nf5 is mate in 5 Position 7 : p@h5 is mate in 5 Position 8 : R@g8 is mate in 5 too Position 9 : Qxg4 mate in 6

The incomplete solutions could explained why SF get a so low score ;-)

rpdelaney commented 7 years ago

@ddugovic I have quite a few more puzzles than we have currently published on chessvariants.training; if you would find them useful I could supply them to you as well.

ianfab commented 7 years ago

The most important thing is to have correct puzzles, i.e. there is only one solution or all winning moves are given in the EPD. If such puzzles are available, that would be very useful.

rpdelaney commented 7 years ago

@ianfab Mating combinations with exactly one winning line are extremely rare in crazyhouse. My puzzle generator builds a solution tree in json with all of the mating lines found by stockfish.

ianfab commented 7 years ago

@rpdelaney I meant that the first move has to be an only move (or the alternatives have to be given), but the subsequent moves can contain alternative solutions, since it usually is sufficient to only check the first move when running a test suite.

rpdelaney commented 7 years ago

@ianfab Alternatives are returned at every level, including the first move. Only those lines that mate equally fast as the fastest mating line are returned, though.

ianfab commented 7 years ago

Finding the fastest mate is also an interesting task, but it probably is not a good indicator for general playing strength, so I am hesitant to use such problems for optimizing stockfish, but nevertheless the puzzles might be interesting.

rpdelaney commented 7 years ago

However, as @niklasf pointed out, they were generated using this version of stockfish to begin with. So I'm not sure what use they would be for debugging stockfish.

ianfab commented 7 years ago

Yes, that's clearly a limitation. It might still be useful if Stockfish can be tuned towards achieving similar results as at much longer time control (perhaps at least 1-2 orders of magnitude), but calculating a set of all winning moves is important for that, I think, because it otherwise gets overfitted towards finding the fastest solution.

ddugovic commented 7 years ago

It's true that finding the fastest mate is more expensive than finding a strong move. Given that the success rate is already over 94% I'm not interested in tuning to maximize that result, but rather to help identify new ideas for trying in the testing queue.

Vinvin20 commented 7 years ago

The incomplete solutions could explained why SF get a so low score ;-)

We need a script to make SF analyze all the positions, 10 best moves, 1 core, 2 minutes per positions.

rpdelaney commented 7 years ago

@Vinvin20 This seems to work, but I haven't tested it much https://gist.github.com/rpdelaney/7e7e19ea1f6ed6cb2a67c03137e0f040

ddugovic commented 7 years ago

@Vinvin20 Here is another option: https://github.com/niklasf/python-chess/blob/master/examples/bratko_kopec/bratko_kopec.py

ppigazzini commented 7 years ago

@rpdelaney JHellis has derived Mate Finder from Official SF. Perhaps some of those tricks can be useful for a "Multi Variant Mate Finder" to be used to build some harder mates for Multi Variant SF.

jhellis3 commented 7 years ago

I'm not sure how many of my changes in Matefinder would be generally applicable, however, the changes to LMR are probably the most likely to improve the tactical awareness of the engine in a general sense:

Both are likely to cost Elo (due to longer TTD)... but you are much less likely to miss tactical shots.

Vinvin20 commented 7 years ago

Both are likely to cost Elo (due to longer TTD)... but you are much less likely to miss tactical shots.

Crazyhouse is way more tactical than chess. After 5-10 moves, each side begin to attack heavily around the king. One missed move and it can be over a couple of moves later.

ddugovic commented 7 years ago

Right, although one missed move isn't necessarily a tactical shot move -- in general we measure Elo gain of patches.

I do wonder whether adding a "Study" mode similar to matefinder could be useful for puzzle generation and puzzle-solving.

ddugovic commented 7 years ago

I have updated the 100 challenging puzzles link. Currently SF scores 96% on the chessvariants.training crazyhouse set, and of these 100 it fails it scores 40% on a second attempt.

Again, the goal isn't to score 100%, but to find low-cost improvements.

ianfab commented 7 years ago

According to a test that uses @ddugovic's set of 100 puzzles as starting positions and some local tests with difficult mate combinations, mate finding has (to my surprise) improved a lot with the crazyhouse qsearch patch.

ppigazzini commented 7 years ago

@ianfab from the regression test it seems that the patch https://github.com/ddugovic/Stockfish/commit/a16ebb0d211256121ebb1d702b5310e6b39f7b07 performs better with LTC

ddugovic commented 7 years ago

Wow, did this release gain 121 Elo over the previous release?

ianfab commented 7 years ago

@ppigazzini @ddugovic This Elo gain is expected, but nevertheless an awesome result considering that improvement had slowed down before. Search for "crazyhouse" on the diff page https://github.com/niklasf/Stockfish/compare/4257b04...af68133 to find the relevant patches that gave the improvement. The improvements mainly came from the qsearch patch (~50-60 Elo), PSQT tweaks (~20-30 Elo), and king safety related parameter tweaks (~20-30 Elo). (The Elo gains are just rough estimates from the SPRT results at LTC, no direct measurements.)

Vinvin20 commented 7 years ago

Great ! Incredible improvement ! I see often Yasser Seirawan playing ZH against SF Level 7 : https://lichess.org/@/yasser-seirawan/search?hasAi=1&sort.field=d&sort.order=desc I'll watch more closely with this improvement ! Very good job from the whole team !

ddugovic commented 7 years ago

Take a difficult position such as r1bn1r1b/ppp1Nppk/3pnP1p/8/2BpP3/5N2/PP3PPP/R2Q1RK1/Qbp w - - 0 1 where the mate is 1. Ng5+! Nxg5 2. Bxf7!! with unstoppable threats of Bg6#, Q@g6#, and Q@g8+ forcing mate. Currently this is a bit much for Stockfish to find in < 1 second, but it did highlight two things a human player considers:

ianfab commented 7 years ago

I tried to address the first point with a patch some time ago, but my tests unfortunately were not very successful, see, e.g., http://35.161.250.236:6543/tests/view/58f75dbc6e23db2fa80810d1. The second point should mostly already be addressed in king safety and threat evaluation, but perhaps some more crazyhouse specific ideas could be added there.