featurecat / lizzie

Lizzie - Leela Zero Interface
GNU General Public License v3.0
958 stars 227 forks source link

Intended Changes - Lizzie 0.7 and beyond #505

Open featurecat opened 5 years ago

featurecat commented 5 years ago

Hi all, it's been a while since Lizzie got updated. I am hoping to release Lizzie 0.7 within 2 weeks, or after Leela Zero 0.17 releases officially.

In addition to reviewing the current PRs, and in addition to the ones that have already been merged (I don't remember off the top of my head), here are the changes I intend, or hope, to add over the next week. No particular order:

Whatever I don't get to, we'll add in Lizzie 0.8. Did I miss any important features? Let me know.

kaorahi commented 5 years ago

Do you already have a modified Leela Zero with ladder detector? If no, how about this quick solution? https://github.com/kaorahi/leela-zero/tree/ladder_updated

featurecat commented 5 years ago

I should give it a try. Have you tested it, does it have any problems with ladders?

featurecat commented 5 years ago

How well does it handle ladder-breaker and ladder-maker moves?

kaorahi commented 5 years ago

Would you read leela-zero/leela-zero#1941? Once a ladder pattern is started, it forces LZ to continue ladder reading until the pattern is broken.

It is a small patch and its benefit is limited. I'm not sure whether it makes LZ stronger or weaker. I only expect mental easing of irritation for the middle part of long ladder reading.

featurecat commented 5 years ago

Unfortunately it seems to be failing my first test: image Although after clicking the ladder move it quickly realizes its superior -- more quickly than the default Leela Zero exe. I think worst case I can use this, but I will try to make a patch myself. I will definitely refer to your implementation.

Mondhund commented 5 years ago

The new networks are really harsh when it comes to winning percentages. A small mistake easily costs 40-50%. So instead of winning percentage, I'm more interested in how many points I'm behind or ahead. One hint is to use 'handicap-instead-of-winrate' in the config and multiply the value by 10, but I'm not sure how good an estimate this is. (E.g. towards the end of the game, it's a few points off the actual result). So it would be great if there's a way to integrate this into the gui - if the values make sense.

featurecat commented 5 years ago

@Mondhund, I agree this is currently an issue. I like the idea of using a log scale to better show the value of each mistake. I'm definitely worried about interpretability though. If I use something besides percentages it might confuse users, especially since we can't accurately estimate points yet...

Handicap instead of winrate does seem like a good solution now. I wonder if we can use some other AIs to estimate territory. But that doesn't always determine who is ahead... I'll consider making # of handicap ahead the default option. And if I improve it a little it might be as useful as you describe.

gcp commented 5 years ago

A simple way to get a point count is to force LZ to play out the game at say 1 playout per move. It is not fast on a CPU though, and the result of the game might flip. If you do this in a background process, you can refine the estimate by playing a few games and varying the playouts a bit (say first 1, then 3, 4, 5, etc).

gcp commented 5 years ago

This raises a question of offering the version of Leelaz that investigates every move. I don't know the answer yet.

With the avoid and allow tags this should work with the regular release.

Mondhund commented 5 years ago

A simple way to get a point count is to force LZ to play out the game at say 1 playout per move. It is not fast on a CPU though, and the result of the game might flip. If you do this in a background process, you can refine the estimate by playing a few games and varying the playouts a bit (say first 1, then 3, 4, 5, etc).

I had the same idea and ran a number of games with 1 playout but in many cases LZ loses stones in the progress and resigns. So my impression was that this gives not a very good estimate. Perhaps more playouts increase the accuracy and Maybe the scoring playouts can be ran on the CPU in parallel so that they don't interfere with the GPU calculations.

My (I confess naive and uneducated) hope was that within LZ there still a modified MCTS running that by itself gives a score that can be reused, but this seems not to be the case?

gcp commented 5 years ago

in many cases LZ loses stones in the progress and resigns

You can disable resignation. But I could imagine it plays many desperation moves when losing, and that causes the estimate to be off? Or do you mean that the side that is ahead blunders?

My (I confess naive and uneducated) hope was that within LZ there still a modified MCTS running that by itself gives a score that can be reused, but this seems not to be the case?

There are no Monte Carlo simulations, only the neural network estimate. Unfortunately I think that the level of the neural network is so high now that even a strong Monte Carlo playout engine will cause too much deviation.

featurecat commented 5 years ago

But I could imagine it plays many desperation moves when losing, and that causes the estimate to be off?

That's what I would fear about it. I'd like to test it myself, of course. I think the accuracy of this might be pretty low unless there are many, many playouts/simulations. So you might want to use a smaller, faster network for playouts, but then the point estimate might not reflect the actual winrate, since the estimation network is weaker. Maybe I can train the 10x128 or 6x128 networks on some recent games for the territory estimator. I'm sure a lot of people want this.

With the avoid and allow tags this should work with the regular release.

True, I was thinking about this (https://github.com/leela-zero/leela-zero/issues/1758#issuecomment-416186612) by @AncalagonX. But you're right, the 0.17 release gives Lizzie an opportunity for more control over the search process.

Mondhund commented 5 years ago

in many cases LZ loses stones in the progress and resigns

You can disable resignation. But I could imagine it plays many desperation moves when losing, and that causes the estimate to be off? Or do you mean that the side that is ahead blunders?

it's both: the side that is ahead gives away points in the end and the side that is behind loses points by playing stones that don't work. I ran a number of tests with 1,2 and 10 playouts but never got a meaningful result.

There are no Monte Carlo simulations, only the neural network estimate. Unfortunately I think that the level of the neural network is so high now that even a strong Monte Carlo playout engine will cause too much deviation.

thanks for the clarification. I agree. So it seems that there's no easy way. A heuristic may help, but that's also hard to implement with winning percentages only. It looks like we need to train another NN to estimate the score. But how to train this? Asking a group of Pros to give their assessment after each move and use that as training data? Not realistic.

featurecat commented 5 years ago

I believe KataGo already estimates territory.

alreadydone commented 5 years ago

@lightvector:

I also implemented kata-analyze. Same as lz-analyze but it does not multiply the winrates and such by 10000 and round them, it just leaves them as floats, and it also reports the expected score.

https://github.com/lightvector/KataGo/issues/2#issuecomment-473481983

lightvector commented 5 years ago

Yep, predicting score is pretty straightforward to add to the AlphaZero process if you do it from the start (details here https://arxiv.org/pdf/1902.10565.pdf ).

KataGo also can predict ownership of every individual board location as well rather than just score, changing in real time as the search progresses (if you visualize it, it looks really cool). There is no current GTP command to get this info, but I could easily add it if desired (let me know). The cost is that it will double the memory usage and slightly slow down the program when that info is requested since now it needs to request and store the ownership map from the neural net instead of only policy and value.

Note that KataGo is not as strong as Leela Zero since it has had much less compute to train it. I'm working on that - turns out one or more of the hyperparameters (e.g learning rate) are waaay off from where they should be, which was significantly slowing down the learning later in my training runs. Along with LCB and other improvements, I'm hoping to reduce that gap, and then might turn to more user-facing features, CPU and/or alternate GPU implementations so as to not require CUDA, and such.

I have not really tested the implementation of either lz-analyze or kata-analyze. I did not know of where to find documentation or a clear spec for lz-analyze, so it's possible that I've misunderstood the parameters or format, or flipped a sign in the output or something. Happy to work more on it if people are interested in using it already and notice any issues.

featurecat commented 5 years ago

Thanks for your comments and work on KataGo. I don't have time to officially support kata-analyze in Lizzie right now (obviously... I'm having trouble even releasing Lizzie 0.7 on time), but I'm especially excited about the territory estimates. Do you have a visualization avaliable? Like a gif or video?

lightvector commented 5 years ago

Yep, here are videos from a visualization I threw together for a one-off demo I did. It shows the average of all the ownership maps in the search tree on a 13x13 9x9 board, weighting likely branches of play more, in real time as the search progresses. https://www.dropbox.com/s/svoybd565u6qfjq/go_ownership1.mp4?dl=0 https://www.dropbox.com/s/dmrs8rgzzukeoh7/go_ownership2.mp4?dl=0 https://www.dropbox.com/s/is7xsc1839guku4/go_ownership3.mp4?dl=0

This info currently isn't in kata-analyze (only the total score estimate), but I could add it if desired. It's a lot of extra output (boardsize^2 floating point values), and costly to compute, but maybe outputting it could be controlled by an argument to the GTP command.

Mondhund commented 5 years ago

well, perhaps there is a heuristic: https://explorebaduk.com/2019/04/08/how-many-does-a-2-point-mistake-cost-research/ This distribution probably changes as the game progresses, but it could be a starting point?

featurecat commented 5 years ago

Cool data. It's so tempting to make a points function f(winrate, move_number) based on some estimated data like this... But I know some strong players or pros will do actual counts and notice how inaccurate it is.

lex312 commented 5 years ago

As a 3 dan soon 4 dan it would be very nice to have: -Better support for invalid Leela Zero configurations. "Leela Zero is loading"

-Much better support for multiple gpu systems!

-Better support for newest gpu cards.

-Can we use for ladder detection cpu cores or cuda cores instead of tensor cores?

-When analyzing, show only up to 10 best playable moves on the board at the same time, because I want to learn something and not to see 180 playable moves at the same time.

-Or let me decide how many possible move at the same time I want to see, because maybe someone wants to focus only on the 3 best moves and another person only on the 5 best moves.

-When analyzing, cut the variation because I want something to learn from and not to see the 50 to 100 next variation moves, so show me only 10 moves and that's fine.

-Or let me decide how long a variation can be at the maximum which I want to see. For beginners 5 moves could be fine. For better players maybe 7-8 variation moves. For pros maybe 15 to 20.

-Change the evaluation graph to a better version, like the chess graphs;) please google to understand.

-Improve colors of the heatmap.

-Show coordinates automatically at the beginning.

-Instead of winning advantages use points or let the users decide what they want to use.

-Points doesn't need to be perfect at first run! For me as a dan player and maybe for every other person too it is much more important to know that to have a move with more points is better than better winning percentage. What I figured out so far was, example: 61% winning percentage and 3 points advantage but another move had 60% winning percentage but 15 points advantage. The chance that the opponent will kill my 3 points is in practical games much higher, than killing all my 15 points. Maybe only 14 and I will still win with 1 point.

-Maybe it's better to use cuda cores or cpu cores for counting points instead of tensor cores.

-It will be best to see winning percentage and points advantage for every move.

-To see territory would be great.

lightvector commented 5 years ago

In case it's relevant: KataGo's kata-analyze command, as of tip of master, now can directly report the ownership map in real time during analysis (just like what was used to create those videos) for any client program that might want to visualize that information. As usual, it continues to also be able to directly predict score.

I'm getting close to laying the groundwork for a second long run now, which within the coming couple of months hopefully should go stronger and longer than the run released so far, closing some of the gap in strength between it and the other programs like LZ that have trained for much longer times on much more compute power. Maybe also add some better support for a wider array of GPUs after that.

lex312 commented 5 years ago

Something like a logfile is needed to scan/see bugs/problems and it's easy to paste that file here.

featurecat commented 5 years ago

superbnet - that is so true

On Thu, May 16, 2019, 4:15 PM superbnet notifications@github.com wrote:

Something like a logfile is needed to scan/see bugs/problems and it's easy to paste that file here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/505?email_source=notifications&email_token=ACQHLMR3KPKIQAATZNHUBZTPVW6GDA5CNFSM4HAKOW32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVS6BXY#issuecomment-493215967, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMW6KUH7WNZPUNOWR33PVW6GDANCNFSM4HAKOW3Q .

featurecat commented 5 years ago

kaifahi I'll consider all your suggestions, they are very good suggestions. I will have enough time after May 21 to make 0.7 more polished!

superbnet - that is so true

On Thu, May 16, 2019, 4:15 PM superbnet notifications@github.com wrote:

Something like a logfile is needed to scan/see bugs/problems and it's easy to paste that file here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/505?email_source=notifications&email_token=ACQHLMR3KPKIQAATZNHUBZTPVW6GDA5CNFSM4HAKOW32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVS6BXY#issuecomment-493215967, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMW6KUH7WNZPUNOWR33PVW6GDANCNFSM4HAKOW3Q .

lex312 commented 5 years ago

full Go board after 10 minutes

After 10 minutes of running the analysis at the beginning, I see only thousands of numbers but where is the board? It's vanished:(

In other positions it takes only 1/10 of the time = 60 seconds to have numbers everywhere and I mean this happens on more or less empty boards:(

In other positions it takes only 5 to 10 seconds:(

Take a look at the picture and try to imagine some ideas or variations yourself!!!

It is completely impossible to due some analysis or other things:(

lex312 commented 5 years ago

To have an all playouts counter would be nice and for some things even important.

featurecat commented 5 years ago

You're right I should remove the red fog that comes over the board on 0.7.

On Sat, May 18, 2019, 4:22 AM superbnet notifications@github.com wrote:

To have an all playouts counter would be nice and for some things even important.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/505?email_source=notifications&email_token=ACQHLMTXY7YQEFB4L7AYCB3PV64DJA5CNFSM4HAKOW32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVWKF4Y#issuecomment-493658867, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMR5JZDKI2NKZ7XTJPLPV64DJANCNFSM4HAKOW3Q .

lex312 commented 5 years ago

We should also use google and take a look at some pictures of Go guis.

Take a look at those pictures and you will get many ideas what we can use too and how and what really needs to be done;)

todbeibrot commented 5 years ago

it would be nice if its possible to download more than one network and switch quickly. so it would be easier to get more than one opinion.

ez4u-L19 commented 5 years ago

Due to the change in logic implemented in LZ 0.17, it would be nice to be able to toggle between winrate and lcb as the primary statistic displayed for each move.

croosn commented 5 years ago

@featurecat have you managed to, at least partially, solve ladder issue?

featurecat commented 5 years ago

I've been focusing on my university for now. A little more than I anticipated when I failed to release Lizzie 0.7 on time! But in 8 hours I will take my last final exam :) After I finish taking care of some important things this week, I will work on merging the current PRs.

After that, I don't want to delay 0.7 too much. Implementing both the ladder fix AND play mode + UI improvements would delay, so I think I will release just the ladder fix for 0.7, and the rest of features in either 0.8 or 1.0. Whichever comes first; 1.0 will be a very special release that I hope to finish over the summer. I'll try to be more transparent in the future.

lp200 commented 5 years ago

I want a policy heatmap view function

wjx0912 commented 5 years ago

@featurecat when the 0.70 version release?

bvandenbon commented 5 years ago

Due to the change in logic implemented in LZ 0.17, it would be nice to be able to toggle between winrate and lcb as the primary statistic displayed for each move.

I just updated https://www.zbaduk.com to version 0.17 as well. (about a week ago). Even though it's upgraded it still uses winrates and playouts just like it did in version 0.16

Now, as you pointed out, in version 0.17 there are some new features and suddenly there are multiple interpretations or ways to define "the best move".

ZBaduk just looks at the number of playouts, the move with most visits is marked as "the best one", regardless of winrates. Which is already a bit confusing to some people.

Starting from version 0.17 , do you think we should just forget about winrate and playouts, and just visualize the "LCB" percentage ?

So, upgrading to 0.17 is one thing, but the next question is: what to do we do with the visualization of winning percentages and playouts? - You could create multiple overlays and let the user decide which one he wants (make the users toggle views), but it also puts a lot of technical responsibility in hands of the user, which probably has no clue what he is doing. I have the feeling that we should protect the users from these numbers and simplify the UI. - But I'm not sure what to visualize.

lightvector commented 5 years ago

If that's what ZBaduk does, that's definitely wrong. Hardcoding "max visits" in the GUI or LCB or any other criterion as what determines the "best" move is clumsy, that should be a determinination left up to the bot itself. Even in Leela Zero, max LCB is an over-simplification, actually if the max LCB move has very very few visits, it will actually not be considered best, there is a minimum proportion required! And LCB also involves one or two configurable parameters, and different versions (or different bots, if Lizzie is eventually updated to support multiple engines) may have tuned to find different values to be good. Maybe in the future bots will update to a new criterion entirely for choosing the "best" move. This is certainly not a place for the GUI to be doing anything other than reporting whatever values the bot itself makes available.

My understanding is that Lizzie does get this correct right now, in that the blue-highlighted "best" move is neither determined by max visits nor by LCB computed within Lizzie itself, but rather by the "order" parameter returned in lz-analyze, by which LZ indicates the ordering of its moves desirability by whatever internal ranking it uses.

lightvector commented 5 years ago

Some further thoughts: simply only displaying LCB would also be misleading, because of the directional bias. E.g. in an "even" game it says a move is 65% winrate for you, and the best move is some clear simple move, then you play that clear simple move and suddenly it says it's 40% for the opponent meaning only 60% for you. How did you lose 5% suddenly by playing the bot's favorite move? Well no, you didn't, LCB is subtracting off a few percent from the player to move.

You would absolutely also want to display the mean winrate as well, or otherwise something else that's unbiased and won't oscillate between moves (except insofar as the bot genuinely changes its opinion about a situation between those moves).

As for user confusion, I think that's probably best addressed by an FAQ somewhere, like:

"Why is the blue-highlighted best move sometimes not the move with the highest winrate?" "Often, when a move with a high apparent winrate is not considered best, it's because it has received relatively few visits compared to the current best move. Despite superficially appearing good, the engine may be highly uncertain about its true value and/or the move may go heavily against the instincts of the bot. Often if the move were actually to be investigated more, the superficially high winrate would actually drop. Nearly all modern Go engines take into account the confidence level and number of visits invested to determine what move they think is most likely best".

This is pretty fundamental to how all the current bots work, and as long as the explanation is somewhere prominent, I think it's pretty easy for a typical Go player to understand. e.g. in a real game, if you had to play right now would you prefer the move you know is very good and strong, or the move that looked maybe better on paper but gave you bad vibes due to going strongly against your shape instincts and where you also haven't actually read it enough to rule out a crushing opponent reply?

gcp commented 5 years ago

rather by the "order" parameter returned in lz-analyze, by which LZ indicates the ordering of its moves desirability by whatever internal ranking it uses.

Correct! That's definitely the right way to handle it, as it is robust to LZ's internals changing.

bvandenbon commented 5 years ago

rather by the "order" parameter returned in lz-analyze, by which LZ indicates the ordering of its moves desirability by whatever internal ranking it uses.

Correct! That's definitely the right way to handle it, as it is robust to LZ's internals changing.

Interesting, and very useful information.

Just for the record: I don't think Lizzie actually uses the order parameter at this stage. At first sight, Lizzie does not even parse the "order" parameter at all: See: MoveData.java

featurecat commented 5 years ago

right. we actually just look at the first element returned, iirc. To be more correct in the future we should use the order parameter. On Mon, Jun 3, 2019, 1:02 PM Bram Vandenbon notifications@github.com wrote:

rather by the "order" parameter returned in lz-analyze, by which LZ indicates the ordering of its moves desirability by whatever internal ranking it uses.

Correct! That's definitely the right way to handle it, as it is robust to LZ's internals changing.

Interesting, and very useful information.

Just for the record: I don't think Lizzie actually uses the order parameter at this stage. At first sight, Lizzie does not even parse the "order" parameter at all: See: MoveData.java https://github.com/featurecat/lizzie/blob/master/src/main/java/featurecat/lizzie/analysis/MoveData.java

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/featurecat/lizzie/issues/505?email_source=notifications&email_token=ACQHLMXXZRAJU2EQFCZGE7LPYVFDJA5CNFSM4HAKOW32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW2BORA#issuecomment-498341700, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQHLMS6Y3J3UGP2DYXZSRTPYVFDJANCNFSM4HAKOW3Q .

bvandenbon commented 5 years ago

@featurecat For long running Lizzie sessions there will be an additional challenge : Once there will be an order field, it will become impossible to merge statistics.

(TL;DR : right now Lizzie caches statistics. Once moves dissapear from the memory of LZ, Lizzie still has a copy of them. This leads to the scenario where LZ pushes statistics with less playouts than the cached version of Lizzie. Lizzie will be selective when this happens and will keep the version with most playouts. However, all of this would cause corruption if there would be an "order" field.)

Just something to keep in mind, I guess.

chilin99999 commented 5 years ago

Is there a roadmap shows when will lizzie 0.7 be released?

featurecat commented 5 years ago

I'll check in with @zsalch to see how he feels about the 0.7 release, it should be pretty soon.