CSSLab / maia-chess

Maia is a human-like neural network chess engine trained on millions of human games.
https://maiachess.com
GNU General Public License v3.0
963 stars 121 forks source link

Update tfprocess.py for TensorFlow 2.4+ #57

Open CallOn84 opened 1 year ago

CallOn84 commented 1 year ago

While trying to train my own model, I happen to found an error within tfprocess.py that made train_maia.py not work if you're using a TensorFlow version that is greater than 2.4.

The reason is that tf.keras.mixed_precision.experimental API had been removed with the introduction of tf.keras.mixed_precision in TensorFlow 2.4+.

To fix this, I changed two lines of code.

tf.keras.mixed_precision.experimental.set_policy('mixed_float16'), which can be found in Line 123, was changed to tf.keras.mixed_precision.set_global_policy('mixed_float16').

self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale), which can be found in Line 150, was changed to self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer).

reidmcy commented 1 year ago

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

CallOn84 commented 1 year ago

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

The tf.keras.mixed_precision code in this version of tfprocess.py wouldn't work with TensorFlow>2.4 because, as I understand it, keras.mixed_precision became more stable and improved from the original tf.keras.mixed_precision.experimental that Google decided to remove tf.keras.mixed_precision.experimental.

I would suggest creating some conditional statement that would allow tfprocess.py to identify what version of TensorFlow the user is using and run accordingly.

Something like this:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_visible_devices(gpus[self.cfg['gpu']], 'GPU')
tf.config.experimental.set_memory_growth(gpus[self.cfg['gpu']], True)
if self.model_dtype == tf.float16:
    if tf.__version__ >= '2.4':
        tf.keras.mixed_precision.set_global_policy('mixed_float16')
    else:
        tf.keras.mixed_precision.experimental.set_policy('mixed_float16')
self.active_lr = 0.01
self.optimizer = tf.keras.optimizers.SGD(learning_rate=lambda: self.active_lr, momentum=0.9, nesterov=True)
self.orig_optimizer = self.optimizer
if self.loss_scale != 1:
    if tf.__version__ >= '2.4':
        self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer)
    else:
        self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale)

There's probably a better way of coding this, but this is the only thing my small brain can come up with.

reidmcy commented 1 year ago

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

CallOn84 commented 1 year ago

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

I'm in the middle of training a Maia model that's targeting a rating of around 2500. Once I finish, I can send you the model for testing.

reidmcy commented 1 year ago

That's not a replication of our paper, this code is for the the KDD 2020 paper.

CallOn84 commented 1 year ago

That's not a replication of our paper, this code is for the the KDD 2020 paper.

Can you clarify what you mean by the code being for the KDD 2020 paper? What would paper are you referring to?

reidmcy commented 1 year ago

Aligning Superhuman AI with Human Behavior: Chess as a Model System is the name of the paper

CallOn84 commented 1 year ago

Aligning Superhuman AI with Human Behavior: Chess as a Model System is the name of the paper

Ah, right. I got confused when you said it's not a replication of your paper and referred to something else.

Currently, the switch to tf.keras.mixed_precision.set_global_policy('mixed_float16') and self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer) allows me to run train_maia.py using TensorFlow 2.10. I can provide you the TensorBoard logs alongside the Maia net when the training is fully complete.

ezhang7423 commented 11 months ago

Are there any updates on this?

CallOn84 commented 9 months ago

Are there any updates on this?

In order for me to see if this works, in terms of the end product, I'm currently training a Maia 2200 net. From there, testing it against Maia 1900 would allow me to see whether or not the changes to Keras have any positive or negative effects towards Maia 2200.

reidmcy commented 9 months ago

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

CallOn84 commented 9 months ago

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

Cool, will do.