Using `model.set_weights()` yields incorrect behavior when MLC is enabled

faustomorales commented 3 years ago

EDIT: This issue is related to any use of model.set_weights() (see next comment).

When using tf.keras.callbacks.EarlyStopping with restore_best_weights=True, it seems that the model inference post-training does not work properly. This would seem like something that would be a problem in mainline TensorFlow but I've tried to reproduce with a plain install on Linux and have been unsuccessful. Consider the following trivial example where we train a linear model to learn the function y = 1.5x - 1.

import numpy as np
import tensorflow as tf

X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1
X_train, y_train, X_val, y_val, X_test, y_test = X[:500], y[:500], X[500:750], y[500:750], X[750:], y[750:]

print("tf:", tf.__version__)
for restore_best_weights in [True, False]:
    print("restore_best_weights:", restore_best_weights)
    model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
    model.compile(loss="mse", optimizer="rmsprop")
    model.fit(
        X_train,
        y_train,
        validation_data=(X_val, y_val),
        callbacks=[
            tf.keras.callbacks.EarlyStopping(
                monitor="val_loss",
                min_delta=1e-3,
                patience=50,
                restore_best_weights=restore_best_weights
            ),
        ],
        epochs=500,
        verbose=0
    )
    print(
        "mse:", round(model.evaluate(X_test, y_test)),
        "weights:", [a.flatten()[0] for a in model.get_weights()]
    )

Here is the output using tensorflow_macos.

tf: 2.4.0-rc0
restore_best_weights: True
8/8 [==============================] - 0s 284us/step - loss: 387007348736.0000
mse: 387007348736 weights: [1.4947473, -0.967283]
restore_best_weights: False
8/8 [==============================] - 0s 289us/step - loss: 1.6617e-06
mse: 0 weights: [1.4997954, -1.0002267]

Here is the output using mainline TensorFlow on a Google Colab instance (using pip install tensorflow==2.4.0rc0 to make it apples-to-apples).

tf: 2.4.0-rc0
restore_best_weights: True
8/8 [==============================] - 0s 2ms/step - loss: 3.1445e-04
mse: 0 weights: [1.4946454, -0.9670778]
restore_best_weights: False
8/8 [==============================] - 0s 2ms/step - loss: 7.9514e-08
mse: 0 weights: [1.499929, -0.9998748]

This is pretty puzzling to me! The MacOS version is getting the right weights (1.5 and -1 for the kernel and slope, respectively) at the end of training regardless of the value of restore_best_weights. But, at inference time, we seem to be getting garbage out of the model. 🤔 Any ideas on what to investigate first?

faustomorales commented 3 years ago

The plot thickens ... I was curious so added model.set_weights(model.get_weights()) just before the call to model.evaluate(), hoping that would knock something loose in the model's internal state and fix things. Instead, it actually made it so that model inference resulted in garbage regardless of the value of restore_best_weights. This makes reproducing this issue even simpler.

import numpy as np
import tensorflow as tf

X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1

print("tf:", tf.__version__)
model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
model.compile(loss="mse", optimizer="rmsprop")
model.fit(X, y, epochs=200, verbose=0)
print("before set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])
model.set_weights(model.get_weights())
print("after set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])

Output using tensorflow_macos ...

tf: 2.4.0-rc0
32/32 [==============================] - 0s 215us/step - loss: 1.8945e-06
before set_weights, mse: 1.8945354440802475e-06 weights: [1.4997802, -1.000234]
32/32 [==============================] - 0s 231us/step - loss: inf
after set_weights, mse: inf weights: [1.4997802, -1.000234]

Output using Google Colab ...

tf: 2.4.0-rc0
32/32 [==============================] - 0s 874us/step - loss: 2.0693e-07
before set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]
32/32 [==============================] - 0s 785us/step - loss: 2.0693e-07
after set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]

So it seems that any use of model.set_weights() (which is probably being used under the hood by restore_best_weights) results in a broken model state.

faustomorales commented 3 years ago

The issue can be avoided (albeit defeating the purpose of the fork) using:

import os
os.environ["TF_DISABLE_MLC"] = "1"

apple / tensorflow_macos

Using `model.set_weights()` yields incorrect behavior when MLC is enabled #261