IBM / tensorflow-large-model-support

Large Model Support in Tensorflow
Apache License 2.0
202 stars 38 forks source link

lms in non-callback mode? #43

Closed den-run-ai closed 4 years ago

den-run-ai commented 4 years ago

Is it possible to use LMS in tf.keras without using the callback? This is required when using iterative loop for training on batches. It is very hard to intercept all calls to LMS in appropriate places for all implemented callback methods.

smatzek commented 4 years ago

The version of LMS that used Keras callbacks was used for TensorFlow 1.x. We did some initial testing and work with it on top of TensorFlow 2 but don't think we ever supported it that way in WML CE. It seems that you may be using TensorFlow 2 since you mention a custom iterative training loop.

Instead we wrote a new memory allocator-based LMS for TensorFlow which is what is contained in the master branch of this repository and was included in the TensorFlow 2 based versions of WML CE. I would suggest using that version.

smatzek commented 4 years ago

To directly answer your question about using LMS in tf.keras without the callback, I don't think it's possible. TFLMSv2 (the TF 1.x graph based version of LMS), requires TensorFlow to run in graph mode. The callback is necessary so that tf.keras calls LMS to add the swapping nodes into the graph after it builds the backward propagation nodes.