Open faustomorales opened 3 years ago
The plot thickens ... I was curious so added model.set_weights(model.get_weights())
just before the call to model.evaluate()
, hoping that would knock something loose in the model's internal state and fix things. Instead, it actually made it so that model inference resulted in garbage regardless of the value of restore_best_weights
. This makes reproducing this issue even simpler.
import numpy as np
import tensorflow as tf
X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1
print("tf:", tf.__version__)
model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
model.compile(loss="mse", optimizer="rmsprop")
model.fit(X, y, epochs=200, verbose=0)
print("before set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])
model.set_weights(model.get_weights())
print("after set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])
Output using tensorflow_macos
...
tf: 2.4.0-rc0
32/32 [==============================] - 0s 215us/step - loss: 1.8945e-06
before set_weights, mse: 1.8945354440802475e-06 weights: [1.4997802, -1.000234]
32/32 [==============================] - 0s 231us/step - loss: inf
after set_weights, mse: inf weights: [1.4997802, -1.000234]
Output using Google Colab ...
tf: 2.4.0-rc0
32/32 [==============================] - 0s 874us/step - loss: 2.0693e-07
before set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]
32/32 [==============================] - 0s 785us/step - loss: 2.0693e-07
after set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]
So it seems that any use of model.set_weights()
(which is probably being used under the hood by restore_best_weights
) results in a broken model state.
The issue can be avoided (albeit defeating the purpose of the fork) using:
import os
os.environ["TF_DISABLE_MLC"] = "1"
EDIT: This issue is related to any use of
model.set_weights()
(see next comment).When using
tf.keras.callbacks.EarlyStopping
withrestore_best_weights=True
, it seems that the model inference post-training does not work properly. This would seem like something that would be a problem in mainline TensorFlow but I've tried to reproduce with a plain install on Linux and have been unsuccessful. Consider the following trivial example where we train a linear model to learn the functiony = 1.5x - 1
.Here is the output using
tensorflow_macos
.Here is the output using mainline TensorFlow on a Google Colab instance (using
pip install tensorflow==2.4.0rc0
to make it apples-to-apples).This is pretty puzzling to me! The MacOS version is getting the right weights (
1.5
and-1
for the kernel and slope, respectively) at the end of training regardless of the value ofrestore_best_weights
. But, at inference time, we seem to be getting garbage out of the model. 🤔 Any ideas on what to investigate first?