major bugs report with model.evaluate() and model.predict()

YUX commented 3 years ago

I tried a basic Keras fashion mnist example on both m1 and colab, and I found that model.evaluate() and model.predict() on m1 are just wrong.

The accuracy on test set aound 0.8477 is normal (Colab), but 0.1006 doesn't make any sense (m1).

Code:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.models import Sequential

fashion_mnist = keras.datasets.fashion_mnist
(X_train_full,y_train_full),(X_test,y_test) = fashion_mnist.load_data()
X_valid,X_train = X_train_full[:5000] / 255.,X_train_full[5000:]/255.
y_valid,y_train = y_train_full[:5000], y_train_full[5000:]

model = Sequential([
    Flatten(input_shape=[28,28]),
    Dense(300,activation="relu"),
    Dense(100,activation="relu"),
    Dense(10,activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
             optimizer="sgd",
             metrics=["accuracy"])

history = model.fit(X_train,y_train,epochs=30,
                   validation_data=(X_valid,y_valid))

model.evaluate(X_test,y_test)

X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba

the m1 output :

the colab output:

anna-tikhonova commented 3 years ago

Thank you very much for reporting this and providing a reproducible test case. We will investigate and report back.

icenando commented 3 years ago

@anna-tikhonova Hi Anna,

I'm experiencing the same issue. Please see model.predict results below. On the left it's my results, on the right it's Kaggle's:

Here is a link to my Kaggle notebook, so that you have the full code: https://www.kaggle.com/nandomachado/test-notebook

Also, I seeded my random, np.random and tf.random.set_seed . All results, including weights are identical between my computer and Kaggle's up to when I call model.predict.

The code is also the same: I simply uploaded it to Kaggle with no changes.

ashkan-76 commented 3 years ago

I have an identical problem. I run a conv-net and the loss & val_loss look solid, but predict returns garbage. I think you're already aware of the LSTM issue.

xxlbaslxx commented 3 years ago

I'm here for a same problem. I ran a same code between my M1 and google colab but the result are totally different.

The confusion matrix on my jupyter look too weird to possible. PS. I also use model.predict on my code too.

pelayocampa commented 3 years ago

Same here. Doing tests with this https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/es-419/tutorials/keras/classification.ipynb#scrollTo=E51yS7iCCaXO example. M1 doesn't predict a single piece of clothing :)

maralski commented 3 years ago

The problem appears to be the use of validation_data in fit(). Using this impacts predict(). Perform validation outside of fit() and the results should be good.

Arfius commented 3 years ago

are you using Rosetta2?

ashkan-76 commented 3 years ago

No, all native M1.

On 7 Apr 2021, at 17:57, Alfonso Farruggia @.***> wrote:

are you using Rosetta2?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/apple/tensorflow_macos/issues/145#issuecomment-815071224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHI6ZIN7FKNCRGCW5O6WMFDTHSFF5ANCNFSM4WYU2CZA.

Arfius commented 3 years ago

I have this problem using Rosetta2 as well.... Thanks

tiatordos commented 3 years ago

worst , changing the number os items selected changes the prediction

apple / tensorflow_macos

major bugs report with model.evaluate() and model.predict() #145

the m1 output :

the colab output: