apple / tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Other
3.67k stars 310 forks source link

model.evaluate and model.predict conflict #252

Open parkesb opened 3 years ago

parkesb commented 3 years ago

A strange issue when running an example from Laurence Moroney's "AI and Machine Learning for Coders...". When running the following code on an M1 MacBook Air

import tensorflow as tf

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) , (test_images, test_labels) = mnist.load_data()

training_images = training_images / 255.0
test_images = test_images / 255.0

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer = 'adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

model.fit(training_images, training_labels, epochs=5)

model.evaluate(test_images, test_labels)

classifications = model.predict(test_images)
print(classifications[0])
print(test_labels[0])

I have the following output:

2021-05-04 16:55:00.006592: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-04 16:55:00.006877: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
Epoch 1/5
1875/1875 [==============================] - 1s 371us/step - loss: 0.6283 - accuracy: 0.7843
Epoch 2/5
1875/1875 [==============================] - 1s 362us/step - loss: 0.3812 - accuracy: 0.8641
Epoch 3/5
1875/1875 [==============================] - 1s 360us/step - loss: 0.3384 - accuracy: 0.8760
Epoch 4/5
1875/1875 [==============================] - 1s 358us/step - loss: 0.3089 - accuracy: 0.8882
Epoch 5/5
1875/1875 [==============================] - 1s 351us/step - loss: 0.2940 - accuracy: 0.8925
313/313 [==============================] - 0s 250us/step - loss: 0.3407 - accuracy: 0.8773
2021-05-04 16:55:03.721625: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph sequential/MLCSubgraphOp_2_0 with frame_id = 0 and iter_id = 0 with error: Internal: ExecuteMLCInferenceGraph: Failed to execute MLC inference graph. (error will be reported 5 times unless TF_MLC_LOGGING=1).
2021-05-04 16:55:03.722229: F tensorflow/core/framework/op_kernel.cc:983] Check failed: outputs_[index].tensor == nullptr (0x155827f80 vs. nullptr)

whereas on a 2017 Intel MBP, I have:

2021-05-04 16:54:06.839207: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-04 16:54:06.839455: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-04 16:54:07.030005: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.6239 - accuracy: 0.7835
Epoch 2/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3826 - accuracy: 0.8624
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3382 - accuracy: 0.8761
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3124 - accuracy: 0.8851
Epoch 5/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2951 - accuracy: 0.8907
[7.2896250e-06 1.5256417e-09 3.3264627e-07 3.5464927e-09 1.3898362e-07
 1.7050464e-02 1.0498255e-06 8.5028261e-03 7.9395040e-06 9.7443002e-01]
9

Also, if I remove either the model.predict or the model.evaluate the code produces correct output and no errors.

I'm using regular python virtual envs on the MBP but Miniforge on the MacBook Air

Tensorflow package differences are as follows:

44,46c67,68
< tensorboard==2.5.0
< tensorboard-data-server==0.6.0
---
> tensorboard==2.4.1
48,50c70
< tensorflow==2.4.1
< tensorflow-addons==0.12.1
< tensorflow-datasets==4.2.0
---
> tensorflow-addons-macos==0.1a3
52c72
< tensorflow-metadata==0.30.0
---
> tensorflow-macos==0.1a3
lsw9803 commented 3 years ago

hello, I have run into the same issue, have you got the solution?

parkesb commented 3 years ago

No, but I've not looked at it. It also fails if you try and switch the order (i.e. predict and then evaluate)

ongtw commented 3 years ago

Same problem on my M1 Mac:

(m1) $ python tf_m1_eval_predict_test.py 
loading data...
creating model...
model.fit()...
2021-05-11 16:23:49.023920: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-11 16:23:49.025618: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
Epoch 1/5
1875/1875 [==============================] - 1s 331us/step - loss: 0.6303 - accuracy: 0.7810
Epoch 2/5
1875/1875 [==============================] - 1s 326us/step - loss: 0.3850 - accuracy: 0.8613
Epoch 3/5
1875/1875 [==============================] - 1s 324us/step - loss: 0.3423 - accuracy: 0.8760
Epoch 4/5
1875/1875 [==============================] - 1s 323us/step - loss: 0.3148 - accuracy: 0.8864
Epoch 5/5
1875/1875 [==============================] - 1s 321us/step - loss: 0.2973 - accuracy: 0.8904
model.evaluate()...
313/313 [==============================] - 0s 234us/step - loss: 0.3342 - accuracy: 0.8781
model.predict()...
2021-05-11 16:23:52.478149: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph sequential/MLCSubgraphOp_2_0 with frame_id = 0 and iter_id = 0 with error: Internal: ExecuteMLCInferenceGraph: Failed to execute MLC inference graph. (error will be reported 5 times unless TF_MLC_LOGGING=1).
2021-05-11 16:23:52.480338: F tensorflow/core/framework/op_kernel.cc:983] Check failed: outputs_[index].tensor == nullptr (0x14b617f70 vs. nullptr)
Abort trap: 6

If I run either one of model.evaluate() or model.predict(), then it is fine.

1875/1875 [==============================] - 1s 325us/step - loss: 0.2924 - accuracy: 0.8936
model.predict()...
classifications: [2.3991666e-05 1.9442258e-07 2.4191124e-06 3.0609449e-06 2.1939153e-05
 2.4558472e-02 5.9402762e-05 7.7641018e-02 2.1327533e-04 8.9747626e-01]
1875/1875 [==============================] - 1s 330us/step - loss: 0.2968 - accuracy: 0.8891
model.evaluate()...
313/313 [==============================] - 0s 235us/step - loss: 0.3537 - accuracy: 0.8694
test_labels: 9

Looks like there is code error when using both functions in sequence.

alessio-ca commented 3 years ago

Experiencing the same problem as well (MacBook Pro 13-inch, 2020, Quad-Core Intel Core i5). The problem occurs with both CPU and GPU, using mlcompute.set_mlc_device(device_name="gpu")

devnev39 commented 3 years ago

https://github.com/apple/tensorflow_macos/issues/266#issue-895279506 Check this issue and its solution Might help

edavidk7 commented 3 years ago

It seems the culprit here is the specified activation function of the output layer. Once this parameter is removed, the code works fine. Edit: linear for output layer works fine, sigmoid doesn't