CSSLab / maia-chess

Maia is a human-like neural network chess engine trained on millions of human games.
https://maiachess.com
GNU General Public License v3.0
959 stars 121 forks source link

No output when training script is ran #62

Open mkillah opened 8 months ago

mkillah commented 8 months ago

Hi, thank you for sharing this amazing project with the rest of the world. I have the honour to use maia for my experimentation, but I have some problems and you may be able to help me with.

Problem:

I have managed to succeed in creating my own network from point 1 to point 4. I have data separated into training and validation data - pgns and I double-checked paths to data and config.

But I am stuck at point 4 (Run the training script move_prediction/train_maia.py PATH_TO_CONFIG).

When I run the script I get the message that nothing happens afterwards and no data is generated:

2024-02-22 21:38:59.828812: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. WARNING:tensorflow:From C:\Users\marko\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

2024-02-22 16:39:17 dataset: input_test: virtualPlayerNet/validation// input_train: virtualPlayerNet/training// gpu: 0 model: filters: 64 residual_blocks: 6 se_ratio: 8 training: batch_size: 1024 checkpoint_steps: 10000 lr_boundaries:

2024-02-22 16:39:17 found [] chunk dirs 2024-02-22 16:39:17 found 0 chunks total Not enough chunks 0

I think there is no trainingdata-tool.exe and the script can not convert pgns to .gz files. Does anyone have trainingdata-tool.exe?

It would be delightful if someone could give me a direction :)

Cheers

reidmcy commented 8 months ago

You don't have any chunk files in virtualPlayerNet/training// the double slash makes it look like there's an issue with the path. Are you sure pgn_to_trainingdata.sh ran correctly? It should take some time to run and make a bunch of new files.

mkillah commented 8 months ago

For some reason the issue section did not render /astrix/astrix, but //.

Anyway, I think that pgn_to_trainingdata.sh did not run correctly due to the absence of trainingdata-tool.exe. Consequently, the script did not transform pgns to .gz files, and that result is "Not enough chunks 0".

when I run move_prediction/pgn_to_trainingdata.sh PGN_FILE_PATH OUTPUT_PATH I get:

10000 games matched out of 10000. removed directory '/c/Users/marko/PycharmProjects/maia-chess-virtualchessplayer/data/blocks' Starting on training 10 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 4 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 5 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 6 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 7 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 8 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 9 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 1 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 2 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 3 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Almost done Done

In the repository I can not find trainingdata-tool.exe, only trainingdata-tool.cpp.

Does anyone have trainingdata-tool.exe?

reidmcy commented 8 months ago

You need to compile it yourself and add it to your path

mkillah commented 8 months ago

I have a real trouble compiling it (total noob in C++). May I ask you to cut me some slack and provide me with the file?

reidmcy commented 8 months ago

Did you run the cmake commands listed in the README? That worked for me on a fresh Ubuntu server.

C++ doesn't make portable binaries by default, you need to compile it yourself.

Training the model will require even more technical knowledge. This code release is not tested on other systems and is meant to be documentation of what we did, not a general method of doing training, so will almost certainly require changes to run on your system.

CallOn84 commented 8 months ago

For some reason the issue section did not render /astrix/astrix, but //.

Anyway, I think that pgn_to_trainingdata.sh did not run correctly due to the absence of trainingdata-tool.exe. Consequently, the script did not transform pgns to .gz files, and that result is "Not enough chunks 0".

when I run move_prediction/pgn_to_trainingdata.sh PGN_FILE_PATH OUTPUT_PATH I get:

10000 games matched out of 10000. removed directory '/c/Users/marko/PycharmProjects/maia-chess-virtualchessplayer/data/blocks' Starting on training 10 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 4 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 5 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 6 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 7 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 8 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 9 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 1 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 2 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 3 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Almost done Done

In the repository I can not find trainingdata-tool.exe, only trainingdata-tool.cpp.

Does anyone have trainingdata-tool.exe?

Thankfully, I have a copy of trainingdata-tool before the links became depreciated. You can find it on my repository for my Leela & Leela-drived nets.

CallOn84 commented 8 months ago

Are you running on a Windows or Linux machine @mkillah?

mkillah commented 8 months ago

@reidmcy thank you for the explanation! I only have to wrap my mind around using C++, 'cause I do not have any experience with it.

Anyway, @CallOn84 thank you for providing me with the information and files!! I will give it a try now. I am running everything on a Windows machine.

CallOn84 commented 8 months ago

@reidmcy thank you for the explanation! I only have to wrap my mind around using C++, 'cause I do not have any experience with it.

Anyway, @CallOn84 thank you for providing me with the information and files!! I will give it a try now. I am running everything on a Windows machine.

The good news for you is that I'm currently doing a training run of a Maia 2200 net on my PC, also a Windows machine. The issue you're facing is how Windows handles CMD directories. Instead of /path/to/file/*/, you need it to be \path\to\file\*\. This should fix the no-chunk issue that you're getting.

You must also change two things before you start training a Maia net. Reid and his team used a Tensorflow version with an experimental Keras code that was later replaced with a more stable and improved version of Keras from Tensorflow version 2.4+, about which I made a pull request. The good thing is that Reid and his team are revising the training model that should solve this issue and using more up-to-date libraries.

Until then, if you want to train a Maia net now, you'll need to change two lines of code in the tfprocess.py Python file that you can find in the pull request I made, which you can find the link here: https://github.com/CSSLab/maia-chess/pull/57

If you have any questions or issues, let me know here or through Discord by sending a message to callon84.

Good luck!