Lizzie fails silently if no response to version command

mkmatlock commented 5 years ago

I am running leelaz over ssh to lizzie with the latest master lizzie branch. When lizzie starts leelaz, it immediately sends version and boardsize commands. At that time, leelaz is still initializing, and these commands never get acknowledged by the remote leelaz.

If I try to run analysis after initialization completes, the GTP console will stop responding to any commands.

However, if I manually type the version command after initialization, then lizzie will function normally (though extremely slowly as per issue #510).

I am running Mac OSX 10.13.6 on a 2013 macbook pro with java 1.8.0_25

gcp commented 5 years ago

I think other GUI makers have pointed out LZ's behavior here is not very nice (it should not load the network immediately after starting), but it's not the easiest thing to fix.

mkmatlock commented 5 years ago

I think it can be fixed just by improving the version check code in lizzie. I disabled it in the 0.5 branch and now it runs fine. I'll try to fix it properly and submit a pull req once #510 is fixed and I can use 0.6.

bvandenbon commented 5 years ago

I can share some insights on how ZBaduk does it. - It took me a long time to figure out how to make it reliable. - But I've run it on 5 different systems now, without issues.

ZBaduk uses a queue:

it schedules tasks in a queue which is handled single threadedly. --> commands are handled 1-by-1
when that queue is created, the first task is always an INIT task.
when the user reviews a game, it will add GOTO_AND_PONDER tasks.

INIT: 1) The INIT task actually is the one that starts the leela zero process. 2) When that happens, it watches the stderr until it sees a line that starts with "BLAS Core:". (There is a timeout for this of max 60 seconds). 3) Next, it sends the "name" command and waits for a response on the stdout output that starts with "= ".

If any of these steps fails then it kills the process and retries. (e.g. it can fail when there are running > 6 instances of leela zero, or when they are started at the same time, or when there's not enough memory).

When all that is done, the INIT task is completed, removed from the queue, and the next command is handled. - If you try to navigate through a game before all this is done --> problems are almost guarenteed. So, that's why queuing is so important.

featurecat / lizzie

Lizzie fails silently if no response to version command #511