Closed Vinvin20 closed 7 years ago
I think that the latest release (151116) from https://github.com/niklasf/Stockfish/releases should be used. We could maybe slightly improve by using the current head and compiling a version with crazyhouse only, but I think stability is more important, especially when taking into account the gap between SF and the rest of the field on rating lists.
@ddugovic What do you think?
It starts at 2016-12-02, so this leaves 1 week to get ready.
Enough time to make a release on lichess any day now, if you like.
Regarding Crazyhouse Computer Championships 2016 I agree with @ianfab .
Regarding lichess it seems official-stockfish/master isn't changing anytime soon, and I'm not aware of any critical issues at the moment. My next goal is to use variant syzygybases (which doesn't affect crazyhouse), but with so many unbounded open issues I hesitate to start developing anything new!
Yes - stability is key at this juncture. Good luck and I'll be rooting for SF.
Sent from my iPhone
On Nov 24, 2016, at 6:02 AM, Fabian Fichter notifications@github.com wrote:
I think that the latest release (151116) from https://github.com/niklasf/Stockfish/releases should be used. We could maybe slightly improve by using the current head and compiling a version with crazyhouse only, but I think stability is more important, especially when taking into account the gap between SF and the rest of the field on rating lists.
@ddugovic What do you think?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Good news, SF-zh won around 70 points between 15Nov2016 and 02Nov2016 : https://sites.google.com/site/zhassociation/computers/rating-list/blitz
Congratulations Dan!
Engine update is allowed, so bug fixes and further improvement can still be entered.
If it's OK with @ianfab (and if not I guess I'll make decisions) I defer to whatever he decides regarding submission, bugfixes, further improvement, etc.
Stockfish should be able to make it to the final anyway (with the version from November 15th), and by then we will probably have a release that will have been sufficiently tested on lichess and elsewhere.
the latest is very strong, nice work - will grab the latest pr
Rank Name Rating Δ + - # Σ Σ% W L D W% =% OppR
---------------------------------------------------------------------------------------------------------
1 Stockfish Crazy-House-1 64 POPCNT (UCI2WB) 3245 0.0 32 32 600 484.5 80.8 481 112 7 80.2 1.2 2955
2 Stockfish 251116 64 POPCNT (UCI2WB) 2955 289.7 32 32 600 115.5 19.2 112 481 7 18.7 1.2 3245
Sounds good to me @MichaelB7 (only now I see the original post in that thread); anything decisive to keep everyone satisfied w/o triggering a bike shed discussion about the many possibilities.
According to @MichaelB7 's comment, latest master a9938b7 has a 3111 rating (according to his rating system tested at ? + ?s time control).
As far as I know, latest "stable" release tested at 120s + 1.2s and at ? + ?s is bee468b.
Apologies to others for not answering #149 earlier as I'm learning the usefulness of testing at 30s + 0.3s and slower since evidently (#145) 10s + 0.1s fails to detect regression.
With the championship starting soon, I will request our entry be updated to bee468b5c86c978d69845137db0c8f5f84fb8cfe (matching that in my previous comment) as I believe that's the latest stable release.
EDIT: I suppose I'd need to build the release for @fsmosca to use it, and I'm unable to build for a Windows platform (I don't even know what arguments are used for the build). I suppose we'll have to stick with "Stockfish zh 30Nov2016 64bit" unless all that can be sorted out in time (doubtful).
"Stockfish zh 30Nov2016 64bit" worked and performed well in his tests (except for a few time losses that should no longer happen with adjusted move overhead), so I think there is no need for a last minute change.
After setting move overhead to 1000, I no longer see time loses from sf zh 30nov2016 at TC 3+2.
I can build a windows binary (not bmi) if needed.
Yes , use https://github.com/ddugovic/Stockfish/commit/bee468b5c86c978d69845137db0c8f5f84fb8cfe My latest commit has not been fully tested yet. Plus the difference , if any, is not great.
Sorry for my last minute panic as this week I had to work overtime. EDIT: please disregard the following (unless already regarded).
Rapid time control tests for bee468b are very positive and with apologies to @ianfab (I have failed to perform a complete STC test on my machine and I'm working on setting that up) at this time based on test results so far I feel more comfortable with bee468b (reverting #145 but otherwise same as "Stockfish zh 30Nov2016 64bit") than with "Stockfish zh 30Nov2016 64bit".
@ianfab is correct in that "Stockfish zh 30Nov2016 64bit" worked and performed well; ultimately my machine cannot in any reasonable timeframe tell which is stronger at LTC. I see that the participant is "Stockfish zh 30Nov2016 64bit" which is OK.
Sorry for my emotional last post.
I hope that for the "Final Stage" round we get to use an updated binary which has a 0.3% speedup and a simplification optimization. @ianfab I have started a STC test of latest master versus "Stockfish zh 30Nov2016 64bit" - let me know if I should test something more (I'll test LTC next).
At a bullet time control (1+1) latest master is not an improvement over 30Nov2016. Now testing LTC (and if it's an improvement I'll test 3+2 next).
EDIT: 30Nov2016 is better at LTC; next I will identify which commit(s) regress.
@ddugovic I thought you were referring to fishtest time controls when mentioning STC/LTC?!
I think we should define fixed conditions for how to test patches (new code, tweaks, simplifications) in the future regarding time controls, statistics (SPRT bounds, number of games), and opening books, so that it is transparent when several users are doing tests (and contributing patches). I think it makes sense to be guided by the established conditions of fishtest while taking into account the differences regarding computation time and Elo difference. However, regression tests with different conditions will of course always be useful regardless of any fixed test conditions.
I agree and I'm open to suggestions as I struggle to test correctly.
In the following I am going to summarize my current thinking on testing and the way I have been doing tests so far. Feedback and suggestions are very welcome.
Just like on fishtest, I think there should be a first test on STC, and then a second test to show the scaling to LTC, since we have already seen that not doing LTC tests can cause regressions to pass undetected.
Using the same time controls as fishtest (STC 10s+0.1s, LTC 60s+0.1s) would probably take too long for LTC, so it could make sense to half/reduce the time controls. Reducing both STC and LTC could restrict the validity of STC tests, whereas only reducing LTC would make it more difficult to see scaling effects, so I am currently not sure about that. Since I did not have opening books for all variants until two days a ago, I have varied the base time (0.1-10s) to avoid repeated games, but used the the same increment as STC on fishtest (0.1s).
Since we are testing many variants, I think we need an automatic way to generate the opening books. Since Stockfish might be the only (strong) engine for some variants, it makes sense to use it for generation. I have started to write such a book generator based on Stockfish two days ago, but of course it is rather experimental yet. Using it, I generated a set of EPD opening books for the variants supported by Stockfish (excluding Relay chess, since I have not dealt with it yet).
Since I do not know of another way to perform SPRT tests for variant engines, I use my very basic far-from-perfect testing script. Nevertheless, it at least seems to work quite well in pratice so far, since my patches tested with this script improved Stockfish by several hundred Elo in several variants.
So far I have always used [0,20] Elo bounds for SPRT tests. I got to those empirically by considering the typical Elo gains of patches and the number of games neccessary for such patches to pass an SPRT test. In the beginning almost all patches added new code/ideas and the Elo gain typically was huge (>50 Elo), so this was fine, but since more and more patches are going to be paramter tweaks and simplifications with only small Elo differences, and since Elo differences are decreasing as Stockfish is improving, we have to think about the SPRT bounds. Maybe we could simply use a multiple (3-5?) of the bounds used on fishtest (general [0,5], tweaks [0,4], simplifications [-3,1]).
Since LTC tests could take a long time, we could think about using a fixed number of games for LTC to only regression test the changes. Probably time will tell whether lengthy tests on LTC are feasible.
I have written a slightly modified version of Stockfish's SPSA tuner to add variant support. It should be more or less self-explanatory if you know how to use the SPSA tuner for official Stockfish.
Thanks, although I am unfamiliar with SPRT I believe that is excellent guidance. I studied statistics both at university and in high school, I simply am unfamiliar with this particular heuristic/experiment.
Due to official-stockfish/Stockfish#603 and official-stockfish playing in competitions with "Move Overhead=1000", I think LTC (and possibly STC?) tests should be performed with "Move Overhead=1000" and a minimum increment of 1s. It baffles me that fishtest would use different conditions although perhaps official-stockfish/Stockfish is less prone to timeout than this fork?
I have started experimenting with your modified SPSA tuner and submitted a PR addressing most of the confusion I encountered trying to use it.
The testing methodology discussion doesn't seem to be related to the topic. Can it be split into a new issue, so its easier to find?
Thanks for the input; I will copy this discussion into #149.
@ianfab Do you have any comment on fishnet-091216?
CVCA conditions state "Engine update is accepted but will be used before a match" and I am interested in updating our submission.
My regression test result and the games on lichess are looking good, so I think @fsmosca can use it. Maybe someone could test the windows executables to make sure that they are also working properly.
Thanks. In the TalkChess forum I have requested our submission be updated to @niklasf fishnet-091216.
@Vinvin20 regarding #161 could you please test that fishnet-091216 works properly? (One such test could be a regression test -- simply run against fishnet-301116.)
After 300 games the improvement is around +85 Elo. Let's wait for 1300 more games ...
Stockfish qualifies for the final stage! http://ccva.challonge.com/1st_ccva_comp_champ_2016
It looks like the Final Stage will be (in alphabetical order):
and considering that @ianfab simulated Stockfish - Sunsetter games in #88 (and that in this event Stockfish defeated TJchess and Imortal defeated Sunsetter), the remaining opponent is Imortal!
Excellent !!
@ianfab @fsmosca Just now I started a regression test for http://github.com/niklasf/Stockfish/releases/tag/fishnet-231216 versus http://github.com/niklasf/Stockfish/releases/tag/fishnet-091216 .
I would like to verify if the name Stockfish zh [date] that I use is correct. Perhaps Fishnet [date] is the correct name, so as not to get confused with the Stockfish name, although I add zh to make it look different from Stockfish.
After uci command,
id name Stockfish 221216 64 POPCNT
perhaps this could also be,
id name Fishnet 221216 64 POPCNT
Just let me know if you want to change the name, so I can update the participant list.
@ddugovic regarding regression test, just make sure to also test it at long TC say 60m + 10s inc/move even for some couple of games just to make sure that it is stable in that TC which is close to what I will use in the champs tour (TC 100m + 30s inc/move) in the match for Gold.
For updates I can only test it at TC 3m + 2s inc/move to check its stability.
Just an info the opening suite that I will use in stage 2 is ccva-ch-64start.pgn, this file is available in the the download page here, https://sites.google.com/site/zhassociation/download Look for the start positions directory.
Thanks, I've actually been struggling to produce a better descriptive name than Stockfish zh [date]
since this fork is Stockfish-based and since fishnet refers to Lichess' online distributed AI platform. If I can think of a more descriptive & appropriate name I'll advise. :-)
f I can think of a more descriptive & appropriate name
"Stockfish multi-variant"?
or "Stockfish variants" ?
Or Variantfish compared to Stockfish ;-)
@fsmosca I see the semifinal has already started! Good luck to Imortal 3.0 in the Final Stage! :-)
For the Finals match please update our submission to https://github.com/niklasf/Stockfish/releases/tag/fishnet-231216 (built on 221216)
@ddugovic list is updated with Stockfish zh 22Dec2016 now.
Tournament update: http://talkchess.com/forum/viewtopic.php?topic_view=threads&p=700392&t=62169
Stage 2 semifinals have completed. Stockfish and Imortal will compete for the Gold & Silver, after Sunsetter and TJchess for the Bronze.
Next-scheduled match will be Sunsetter - TJchess. http://ccva.challonge.com/1st_ccva_comp_champ_2016
The final match has started so barring some unforeseen circumstance this issue is resolved.
http://talkchess.com/forum/viewtopic.php?p=695932#695932
If you have a strong build now, please tell us !