ddugovic / Stockfish

Retired multi-variant fork of popular UCI chess engine; please use Fairy-Stockfish instead
https://github.com/ianfab/Fairy-Stockfish
GNU General Public License v3.0
132 stars 44 forks source link

Crazyhouse Computer Championships 2016 #131

Closed Vinvin20 closed 7 years ago

Vinvin20 commented 7 years ago

http://talkchess.com/forum/viewtopic.php?p=695932#695932

If you have a strong build now, please tell us !

ianfab commented 7 years ago

I think that the latest release (151116) from https://github.com/niklasf/Stockfish/releases should be used. We could maybe slightly improve by using the current head and compiling a version with crazyhouse only, but I think stability is more important, especially when taking into account the gap between SF and the rest of the field on rating lists.

@ddugovic What do you think?

Vinvin20 commented 7 years ago

It starts at 2016-12-02, so this leaves 1 week to get ready.

niklasf commented 7 years ago

Enough time to make a release on lichess any day now, if you like.

ddugovic commented 7 years ago

Regarding Crazyhouse Computer Championships 2016 I agree with @ianfab .

Regarding lichess it seems official-stockfish/master isn't changing anytime soon, and I'm not aware of any critical issues at the moment. My next goal is to use variant syzygybases (which doesn't affect crazyhouse), but with so many unbounded open issues I hesitate to start developing anything new!

MichaelB7 commented 7 years ago

Yes - stability is key at this juncture. Good luck and I'll be rooting for SF.

Sent from my iPhone

On Nov 24, 2016, at 6:02 AM, Fabian Fichter notifications@github.com wrote:

I think that the latest release (151116) from https://github.com/niklasf/Stockfish/releases should be used. We could maybe slightly improve by using the current head and compiling a version with crazyhouse only, but I think stability is more important, especially when taking into account the gap between SF and the rest of the field on rating lists.

@ddugovic What do you think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Vinvin20 commented 7 years ago

Good news, SF-zh won around 70 points between 15Nov2016 and 02Nov2016 : https://sites.google.com/site/zhassociation/computers/rating-list/blitz

stockfishdeveloper commented 7 years ago

Congratulations Dan!

Vinvin20 commented 7 years ago

Engine update is allowed, so bug fixes and further improvement can still be entered.

ddugovic commented 7 years ago

If it's OK with @ianfab (and if not I guess I'll make decisions) I defer to whatever he decides regarding submission, bugfixes, further improvement, etc.

ianfab commented 7 years ago

Stockfish should be able to make it to the final anyway (with the version from November 15th), and by then we will probably have a release that will have been sufficiently tested on lichess and elsewhere.

MichaelB7 commented 7 years ago

the latest is very strong, nice work - will grab the latest pr

Rank Name                                        Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
  ---------------------------------------------------------------------------------------------------------
   1 Stockfish Crazy-House-1 64 POPCNT (UCI2WB)   3245   0.0   32   32   600  484.5  80.8  481  112    7  80.2   1.2  2955 
   2 Stockfish 251116 64 POPCNT (UCI2WB)          2955 289.7   32   32   600  115.5  19.2  112  481    7  18.7   1.2  3245 
ddugovic commented 7 years ago

Sounds good to me @MichaelB7 (only now I see the original post in that thread); anything decisive to keep everyone satisfied w/o triggering a bike shed discussion about the many possibilities.

ddugovic commented 7 years ago

According to @MichaelB7 's comment, latest master a9938b7 has a 3111 rating (according to his rating system tested at ? + ?s time control).

As far as I know, latest "stable" release tested at 120s + 1.2s and at ? + ?s is bee468b.

Apologies to others for not answering #149 earlier as I'm learning the usefulness of testing at 30s + 0.3s and slower since evidently (#145) 10s + 0.1s fails to detect regression.

ddugovic commented 7 years ago

With the championship starting soon, I will request our entry be updated to bee468b5c86c978d69845137db0c8f5f84fb8cfe (matching that in my previous comment) as I believe that's the latest stable release.

EDIT: I suppose I'd need to build the release for @fsmosca to use it, and I'm unable to build for a Windows platform (I don't even know what arguments are used for the build). I suppose we'll have to stick with "Stockfish zh 30Nov2016 64bit" unless all that can be sorted out in time (doubtful).

ianfab commented 7 years ago

"Stockfish zh 30Nov2016 64bit" worked and performed well in his tests (except for a few time losses that should no longer happen with adjusted move overhead), so I think there is no need for a last minute change.

fsmosca commented 7 years ago

After setting move overhead to 1000, I no longer see time loses from sf zh 30nov2016 at TC 3+2.

I can build a windows binary (not bmi) if needed.

MichaelB7 commented 7 years ago

Yes , use https://github.com/ddugovic/Stockfish/commit/bee468b5c86c978d69845137db0c8f5f84fb8cfe My latest commit has not been fully tested yet. Plus the difference , if any, is not great.

ddugovic commented 7 years ago

Sorry for my last minute panic as this week I had to work overtime. EDIT: please disregard the following (unless already regarded).

Rapid time control tests for bee468b are very positive and with apologies to @ianfab (I have failed to perform a complete STC test on my machine and I'm working on setting that up) at this time based on test results so far I feel more comfortable with bee468b (reverting #145 but otherwise same as "Stockfish zh 30Nov2016 64bit") than with "Stockfish zh 30Nov2016 64bit".

ddugovic commented 7 years ago

@ianfab is correct in that "Stockfish zh 30Nov2016 64bit" worked and performed well; ultimately my machine cannot in any reasonable timeframe tell which is stronger at LTC. I see that the participant is "Stockfish zh 30Nov2016 64bit" which is OK.

Sorry for my emotional last post.

ddugovic commented 7 years ago

I hope that for the "Final Stage" round we get to use an updated binary which has a 0.3% speedup and a simplification optimization. @ianfab I have started a STC test of latest master versus "Stockfish zh 30Nov2016 64bit" - let me know if I should test something more (I'll test LTC next).

ddugovic commented 7 years ago

At a bullet time control (1+1) latest master is not an improvement over 30Nov2016. Now testing LTC (and if it's an improvement I'll test 3+2 next).

EDIT: 30Nov2016 is better at LTC; next I will identify which commit(s) regress.

ianfab commented 7 years ago

@ddugovic I thought you were referring to fishtest time controls when mentioning STC/LTC?!

I think we should define fixed conditions for how to test patches (new code, tweaks, simplifications) in the future regarding time controls, statistics (SPRT bounds, number of games), and opening books, so that it is transparent when several users are doing tests (and contributing patches). I think it makes sense to be guided by the established conditions of fishtest while taking into account the differences regarding computation time and Elo difference. However, regression tests with different conditions will of course always be useful regardless of any fixed test conditions.

ddugovic commented 7 years ago

I agree and I'm open to suggestions as I struggle to test correctly.

ianfab commented 7 years ago

In the following I am going to summarize my current thinking on testing and the way I have been doing tests so far. Feedback and suggestions are very welcome.

Steps

Just like on fishtest, I think there should be a first test on STC, and then a second test to show the scaling to LTC, since we have already seen that not doing LTC tests can cause regressions to pass undetected.

Time controls

Using the same time controls as fishtest (STC 10s+0.1s, LTC 60s+0.1s) would probably take too long for LTC, so it could make sense to half/reduce the time controls. Reducing both STC and LTC could restrict the validity of STC tests, whereas only reducing LTC would make it more difficult to see scaling effects, so I am currently not sure about that. Since I did not have opening books for all variants until two days a ago, I have varied the base time (0.1-10s) to avoid repeated games, but used the the same increment as STC on fishtest (0.1s).

Books

Since we are testing many variants, I think we need an automatic way to generate the opening books. Since Stockfish might be the only (strong) engine for some variants, it makes sense to use it for generation. I have started to write such a book generator based on Stockfish two days ago, but of course it is rather experimental yet. Using it, I generated a set of EPD opening books for the variants supported by Stockfish (excluding Relay chess, since I have not dealt with it yet).

Testing at STC

Since I do not know of another way to perform SPRT tests for variant engines, I use my very basic far-from-perfect testing script. Nevertheless, it at least seems to work quite well in pratice so far, since my patches tested with this script improved Stockfish by several hundred Elo in several variants.

So far I have always used [0,20] Elo bounds for SPRT tests. I got to those empirically by considering the typical Elo gains of patches and the number of games neccessary for such patches to pass an SPRT test. In the beginning almost all patches added new code/ideas and the Elo gain typically was huge (>50 Elo), so this was fine, but since more and more patches are going to be paramter tweaks and simplifications with only small Elo differences, and since Elo differences are decreasing as Stockfish is improving, we have to think about the SPRT bounds. Maybe we could simply use a multiple (3-5?) of the bounds used on fishtest (general [0,5], tweaks [0,4], simplifications [-3,1]).

Testing at LTC

Since LTC tests could take a long time, we could think about using a fixed number of games for LTC to only regression test the changes. Probably time will tell whether lengthy tests on LTC are feasible.

Tuning

I have written a slightly modified version of Stockfish's SPSA tuner to add variant support. It should be more or less self-explanatory if you know how to use the SPSA tuner for official Stockfish.

ddugovic commented 7 years ago

Thanks, although I am unfamiliar with SPRT I believe that is excellent guidance. I studied statistics both at university and in high school, I simply am unfamiliar with this particular heuristic/experiment.

Due to official-stockfish/Stockfish#603 and official-stockfish playing in competitions with "Move Overhead=1000", I think LTC (and possibly STC?) tests should be performed with "Move Overhead=1000" and a minimum increment of 1s. It baffles me that fishtest would use different conditions although perhaps official-stockfish/Stockfish is less prone to timeout than this fork?

I have started experimenting with your modified SPSA tuner and submitted a PR addressing most of the confusion I encountered trying to use it.

sf-x commented 7 years ago

The testing methodology discussion doesn't seem to be related to the topic. Can it be split into a new issue, so its easier to find?

ddugovic commented 7 years ago

Thanks for the input; I will copy this discussion into #149.

ddugovic commented 7 years ago

@ianfab Do you have any comment on fishnet-091216?

CVCA conditions state "Engine update is accepted but will be used before a match" and I am interested in updating our submission.

ianfab commented 7 years ago

My regression test result and the games on lichess are looking good, so I think @fsmosca can use it. Maybe someone could test the windows executables to make sure that they are also working properly.

ddugovic commented 7 years ago

Thanks. In the TalkChess forum I have requested our submission be updated to @niklasf fishnet-091216.

@Vinvin20 regarding #161 could you please test that fishnet-091216 works properly? (One such test could be a regression test -- simply run against fishnet-301116.)

Vinvin20 commented 7 years ago

After 300 games the improvement is around +85 Elo. Let's wait for 1300 more games ...

ddugovic commented 7 years ago

Stockfish qualifies for the final stage! http://ccva.challonge.com/1st_ccva_comp_champ_2016

It looks like the Final Stage will be (in alphabetical order):

and considering that @ianfab simulated Stockfish - Sunsetter games in #88 (and that in this event Stockfish defeated TJchess and Imortal defeated Sunsetter), the remaining opponent is Imortal!

MichaelB7 commented 7 years ago

Excellent !!

ddugovic commented 7 years ago

@ianfab @fsmosca Just now I started a regression test for http://github.com/niklasf/Stockfish/releases/tag/fishnet-231216 versus http://github.com/niklasf/Stockfish/releases/tag/fishnet-091216 .

fsmosca commented 7 years ago

I would like to verify if the name Stockfish zh [date] that I use is correct. Perhaps Fishnet [date] is the correct name, so as not to get confused with the Stockfish name, although I add zh to make it look different from Stockfish.

After uci command,

id name Stockfish 221216 64 POPCNT

perhaps this could also be,

id name Fishnet 221216 64 POPCNT

Just let me know if you want to change the name, so I can update the participant list.

fsmosca commented 7 years ago

@ddugovic regarding regression test, just make sure to also test it at long TC say 60m + 10s inc/move even for some couple of games just to make sure that it is stable in that TC which is close to what I will use in the champs tour (TC 100m + 30s inc/move) in the match for Gold.

For updates I can only test it at TC 3m + 2s inc/move to check its stability.

Just an info the opening suite that I will use in stage 2 is ccva-ch-64start.pgn, this file is available in the the download page here, https://sites.google.com/site/zhassociation/download Look for the start positions directory.

ddugovic commented 7 years ago

Thanks, I've actually been struggling to produce a better descriptive name than Stockfish zh [date] since this fork is Stockfish-based and since fishnet refers to Lichess' online distributed AI platform. If I can think of a more descriptive & appropriate name I'll advise. :-)

sf-x commented 7 years ago

f I can think of a more descriptive & appropriate name

"Stockfish multi-variant"?

Vinvin20 commented 7 years ago

or "Stockfish variants" ?

ianfab commented 7 years ago

Or Variantfish compared to Stockfish ;-)

ddugovic commented 7 years ago

@fsmosca I see the semifinal has already started! Good luck to Imortal 3.0 in the Final Stage! :-)

For the Finals match please update our submission to https://github.com/niklasf/Stockfish/releases/tag/fishnet-231216 (built on 221216)

fsmosca commented 7 years ago

@ddugovic list is updated with Stockfish zh 22Dec2016 now.

http://ccva.challonge.com/1st_ccva_comp_champ_2016/comments

ddugovic commented 7 years ago

Tournament update: http://talkchess.com/forum/viewtopic.php?topic_view=threads&p=700392&t=62169

Stage 2 semifinals have completed. Stockfish and Imortal will compete for the Gold & Silver, after Sunsetter and TJchess for the Bronze.

Next-scheduled match will be Sunsetter - TJchess. http://ccva.challonge.com/1st_ccva_comp_champ_2016

ddugovic commented 7 years ago

The final match has started so barring some unforeseen circumstance this issue is resolved.