Open Vandertic opened 4 years ago
good x-mas gift.
Regarding the ladder enhancement; do you have any hypotheses on how the change might affect behaviour in terms of "human understandable play style"? I think I see how the change would affect exploration but it will still consume many playouts to keep track of ladders for each state/other moves that could be player (as far as I understand). My recent thoughts on this include considering whether the tree search could be changed to do something like play out sequences that lead to transpositioned states "for free", either by somehow not counting the playouts/visits in the normal way or by only doing a neural net evaluation at the state where the sequences converge. Hmm I don't think that explains what I mean well... an example might be a state in which there is a ladder; so if the search explores the ladder and then a ladder breaking move (and its response), it would be nice to "skip over" the ladder reading and just consider the state in which the branch's initial moves "have an effect". The idea is that this would avoid the playouts being spent on reading the same (e.g. ladder) sequences being read out multiple times.
A related thing I looked into a while ago was changing lz's use of the NN cache to "fully recover" transposed sequences without counting them as playouts in https://github.com/leela-zero/leela-zero/pull/1492. I was attempting to effectively turn the search tree into a multiple parent graph, again with the aim of saving some playouts (i.e. transposed positions). My implementation also tried to not slow down the search by stopping the transposed sequence recovery at the existing leaf node rather than ending with a NN eval for every cache hit. It was buggy though.
Unfortunately for me I am not versed well enough in MCTS theory to really know what I'm doing.. so I might as well ask if anyone has insight into whether those ideas might have any of the potential benefit I think they might?
@Hersmunch you definitely have a good point. I completely agree with the consideration in your first paragraph. I also did spend time thinking for solutions in the lines of your second paragraph, and I don't think there is a good way to do that without opening the whole project to horrible blind spots because of peculiar situations we did not consider and included.
This is what I would like to do: with the current patch we hope to correct at least the basic ladder errors in common situations. Training on self-plays by this new code should improve ladder understanding a lot, so we expect to at least not be embarrassed by SAI ladder blunders anymore.
We already have some evidence of this (some client adopted the ladder code many days ago) but if it will not be convincing enough, I am willing to add some more ad hoc logic in the sense of your second paragraph. (Basically, I intend to force in MCTS the virtual "ladder-winning" opponent to always answer a ladder move with proper ladder response.)
This would never be enough to reach superhuman level in ladder problems though. If you have a complex position with (say) two colliding ladders, some sente ladder breaker moves available and maybe a couple of kos, then tweaking MCTS will never be enough.
So I'd like to use the beautiful ladder code by Ttl to compute additional feature planes with ladder solving, and then of course retrain a new network using this features (recomputed for all past games).
The latter is a long-term plan, though.
Net 199 reads out a ladder across the board before it is played against net 139. Both were using 0.17.5. Net 139 doesn't identify it is losing until 5 stones are in atari. Game was played with 100k visits.
(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.43.3]KM[7]SZ[19]DT[2019-12-31]PB[SAI v0.17.5 17e4]PW[SAI v0.17.5 8491]SBKV[50.53];B[pd]SBKV[48.83];W[dc]SBKV[51.05];B[ce]SBKV[48.95];W[qc]SBKV[51.25];B[ed]SBKV[48.7];W[ec]SBKV[51.21];B[fd]SBKV[48.62];W[gc]SBKV[51.22];B[qd]SBKV[48.96];W[pc]SBKV[51.42];B[nc]SBKV[47.3];W[nb]SBKV[51.15];B[mb]SBKV[47.8];W[ob]SBKV[51.38];B[md]SBKV[47.6];W[qp]SBKV[52.45];B[rc]SBKV[48.3];W[rb]SBKV[52.84];B[rd]SBKV[48.25];W[sb]SBKV[52.84];B[dq]SBKV[47.89];W[pi]SBKV[54.07];B[co]SBKV[48.15];W[np]SBKV[54.61];B[qk]SBKV[48.45];W[gd]SBKV[55.63];B[ge]SBKV[48.64];W[he]SBKV[55.52];B[hf]SBKV[45.96];W[gf]SBKV[55.34];B[fe]SBKV[45.34];W[ie]SBKV[55.85];B[if]SBKV[44.8];W[je]SBKV[55.46];B[gg]SBKV[44.34];W[od]SBKV[57.49];B[oe]SBKV[34.95];W[ne]SBKV[63.86];B[nd]SBKV[32.57];W[oc]SBKV[66.19];B[nf]SBKV[25.51];W[me]SBKV[77.04];B[le]SBKV[25.54];W[mf]SBKV[88.51];B[mg]SBKV[19.1];W[lf]SBKV[92.91];B[kf]SBKV[19.03];W[lg]SBKV[94.62];B[lh]SBKV[13.83];W[kg]SBKV[95.26];B[jg]SBKV[71.39];W[kh]SBKV[95.48];B[ki]SBKV[83.52];W[jh]SBKV[95.74];B[ih]SBKV[85.8];W[ji]SBKV[96.12];B[jj]SBKV[86.72];W[ii]SBKV[96.5];B[hi]SBKV[87.54];W[ij]SBKV[96.95];B[ik]SBKV[88.62];W[hj]SBKV[97.3];B[gj]SBKV[89.49];W[hk]SBKV[97.72];B[hl]SBKV[90.32];W[gk]SBKV[98.02];B[fk]SBKV[91.23];W[gl]SBKV[98.5];B[gm]SBKV[92.26];W[fl]SBKV[98.8];B[el]SBKV[93.47];W[fm]SBKV[99.19];B[fn]SBKV[94.87];W[em]SBKV[99.4];B[dm]SBKV[95.44];W[en]SBKV[99.42];B[eo]SBKV[95.77];W[dn]SBKV[99.93];B[cn]SBKV[96.74];W[hh]SBKV[99.99];B[do]SBKV[100];W[pk])
Thanks @barrtgt for confirming our impression on definitive improvements on SAI's ability on reading ladders... This confirms my [admittedly limted] tests on the ability of solving ladders that improved in each generation, especially since the new ladder code tweak was introduced in the latest SAI version.
From my experince though the network has still some improvements to do, since as you can witness yourself, SAI still loses sometimes by ladder against LZ in comparisons matches, so the learning process is not yet finished.
Another point is the fact that often still, once committed to continue the ladder, even if discovers that continuing is a losing proposition it still plays it until the ultimate demise. Looking at the visits output it is possible to see that this is due to the somewhat erroneous value of winning probability that is not yet good enough to suggest stopping earlier. Ultimately though this latest point should not be an issue since properly trained networks should not even start the losing ladders and the configuration with a potential ladder in the future should have a proper value reflecting its alive status...
I think it is good to have networks play out ladders in training occasionally because it is usually the greatest weakness of az clones. https://arxiv.org/pdf/1902.04522.pdf shows how ladder strength can fluctuate over training. They provided 100 ladder sgfs they used for evaluating, maybe it could be of use for sai?
Leelaz weights use new engine to avoid a lot of ladders. So when sai makes matches with old lz weights, they'll surely perform better. For latest 257th leelaz
Apparently leelaz's weights take much advantage of new engine on ladder reading. LZ092's elo goes from 9700 up to 9771 LZ098's elo goes from 9800 up to 9894 LZ103's elo goes from 9900 up to 9964 LZ107's elo goes from 10000 up to 10096
IMHO that is not what is happening. I think (just a guess without checking the server source) that the LZ ELO is computed completely the same as Sai ELO. That is after new matches are played, the ELO is recalculated for all the networks. If I remember correctly this has been happening for a long time and the LZ nets ELO keeps slowly raising (showing that even despite all the reference and panel matches, the rating is still somewhat inflationary).
In fact if you check comparison matches, you'll note that nowadays Sai almost never runs a failing ladder (did not notice any such a game in past 2 days (but there were few in the first days after ladder code commit)), while LZ fails ladders sometimes (check the short games - 93-150 moves lost by LZ).
@sheeryjay You are right: both the LZ and SAI networks use the same settings in the 'comparison' games. The slow up-drift of the LZ networks ELO is mainly due to the fact that at the beginning even the best SAI networks are too weak to properly measure (with the few matches played) the strength of the LZ ones and, as the SAI improves, the ELO of the LZ networks get progressively better established. In practice one could say that the strength of the LZ networks are not fully stabilised until they get beaten consistently by the SAI ones: e.g. as of now I expect the ELO of the LZ92 to be mostly stabilised, while the ELO of LZ107 to be still fairly uncertain... Similarly even the ELO of the latest SAI networks gets progressively refined with the following few generations...
we can use the validation tool to run a match of sai vs leelaz with same net.such as
validation -g 2 -k sai129_lz91 -n b30.gz -o "-g -v 100 -r 5 -w" -n b30.gz -o "-g -v 100 -r 5 -w" -- sai -- leelaz
in my test, sai engine is a bit better than leelaz test uses lz 107 net
b77.gz v b77.gz ( 701 games)
wins black white
b77.gz 383 54.64% 182 54.82% 201 54.47%
b77.gz 318 45.36% 150 45.18% 168 45.53%
332 47.36% 369 52.64%
701 games played.
Status: 2 LLR 2.9963 Lower Bound -2.94444 Upper Bound 2.94444
@l1t1 I see the parameter -n being equal to b30.gz both times... Was it intended?
@trinetra75 this command is to test whether the ladder function increase the elo of lz net 030. as you know, lz engine hasnt ladder function.
one naive question but that makes me wonder about : what if making ladder mistakes and trying to understand it globally has invisible effects on longterm learning and strength ?
i read recently an article about how AI reading electrocardiogram could predict the risk of heart failure with much higher accuracy than human experts the authors suggested the AI "sees" something invisible in the big data of millions of electrocardiograms it trains on
so maybe by forcing our limited understanding of ladders on AI training, we may be making it "miss" the invisible things that it would (should ?) have taken into consideration otherwise
also, unrelated note, is the idea to disable batching to increase ladder performance planned some future time in this run or a possible next one ?
@wonderingabout I agree that 'forcing' the network to try moves might have some adverse effects on the overall stregth of the network. This is why the first approach to improve on the ladders was a simple nudge to make easier to the network to explore deeper the ladder, but without any 'forcing'.
Indeed the change at the beginning was barely changing anything in terms of ladder reading, but in a few dozen generations SAI is now able to read appropriately quite deep ladders of all kind without any further change. So, since ladders are quite essential to the game in terms of opportunity of a given game state, I suppose a proper ladder reading ability did and will increase the overall network strength; this since SAI is still far away from optimal including on the ability of reading ladders.
@trinetra75 thanks for the insights it seems logical that as ladders are a repetitive and predictable outcome go AI can fall into anytime, having an "escape" from this "fall" whenever it may happen should give an advantage also, as you said, depending on how the implementation is done, it may have big or small side effects, possibly negative, but also possibly positive effect on final strength (i.e. at the end of training, resulting in a final AI stronger than it would have been without this code "hack")
one last question is about does SAI intend to revert to lz 0.16 settings (batch size 1 threads 2) at some point (not necessarily soon), because @Naphtalin showed on discord that using batching on LZ significantly reduces the ladder reading performance thanks
We have never considered batch size and threads parameters until now, basically trusting LZ on something we do not master enough. Will think about this. Thank you.
welcome, it is not my idea, but again Naphtalin's observations he reported on discord i humbly suggest you discuss his test results with him directly
3500 v seems improve little, consider increase the net size.
i am unfollowing this thread, in the future please open a new issue or post in a relevant issue to ask out of topic questions
Sai's net is certainly far from saturation since LZ107 is smaller and clearly stronger
I'm always wondering why we need to wait until saturation of 9 block before changing to 10 block net. It's like comparing running and cycling. Do you have to be trained to run very fast to ensure you being able to ride a bicycle very fast? I doubt it because we are talking about different muscles in your body. To me it is similar to the neural networks. Different network structures may require different vision in order to achieve a winning goal. A 9 block network may have to do something tricky to achieve that extra little potential to be stronger than the other 9 block network, while a 10 block network doesn't have to do that at all.
There is no need to wait for complete saturation. But if we switch to a larger net too early, it will just slow down the selfplay games without any benefit. And if the progress stalls before saturation, it would mean that there is an issue with the training that need to be cleared up.
Anyway, we are training the next structure, which will be 12x256 and when it will be ready, we can compare and discuss and decide.
If you don't give sai a komi to use, does it decide what is fair or is it just 7.5? And in your paper, it discusses using a winrate threshold for the λ and μ values to operate, is there any way to use it currently?
@barrtgt SAI uses the same default as LZ, that is komi 7.5
Thanks
What is the difference between --policy_temp and --softmax_temp?
The new release is out. There are several improvements, so I would recommend everyone to update ASAP.
1) Ladder enhanced exploration to improve ladder performance a bit -- hopefully enough to make the training able to learn ladders much better.
2) New random moves sampling to improve random exploration of played moves. Also improved comments with a bitfield.
3) Recent LZ commits included, up to this one.
4) The command
eval
now works better, even if still gives segfault in some cases. If you want to study how a SAI (or LZ) network reasons in a given position, useeval [visits=1000] [filebasename]
and look for two output filescsv
andsgf
.Enjoy!