Open Willian-Zhang opened 3 years ago
This is giving me results on my MacBook Air 2020 m1 8G
2020-11-20 23:47:18.141957: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 23:47:18.145970: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-20 23:47:18.479186: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/10
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1614 - accuracy: 0.9519/Users/willian/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1613 - accuracy: 0.9519 - val_loss: 0.0449 - val_accuracy: 0.9853
Epoch 2/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0427 - accuracy: 0.9867 - val_loss: 0.0336 - val_accuracy: 0.9885
Epoch 3/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0264 - accuracy: 0.9914 - val_loss: 0.0333 - val_accuracy: 0.9885
Epoch 4/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0167 - accuracy: 0.9946 - val_loss: 0.0393 - val_accuracy: 0.9879
Epoch 5/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0128 - accuracy: 0.9956 - val_loss: 0.0333 - val_accuracy: 0.9890
Epoch 6/10
469/469 [==============================] - 24s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9973 - val_loss: 0.0341 - val_accuracy: 0.9900
Epoch 7/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9975 - val_loss: 0.0379 - val_accuracy: 0.9887
Epoch 8/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9979 - val_loss: 0.0366 - val_accuracy: 0.9906
Epoch 9/10
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9982 - val_loss: 0.0512 - val_accuracy: 0.9859
Epoch 10/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9982 - val_loss: 0.0462 - val_accuracy: 0.9884
Keys are:
on my Mac mini 2020 m1 16G
Run this on my MacBook Pro (16 Zoll, 2019) 2,3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 8GB
2020-11-20 17:42:23.136427: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-20 17:42:23.318515: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 17:42:24.014368: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1588 - accuracy: 0.9514/Users/jochen/projects/ds_tutorial/mac_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 57s 114ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1588 - accuracy: 0.9514 - val_loss: 0.0479 - val_accuracy: 0.9841
Epoch 2/12
469/469 [==============================] - 56s 116ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0442 - accuracy: 0.9863 - val_loss: 0.0348 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 56s 115ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0277 - accuracy: 0.9913 - val_loss: 0.0393 - val_accuracy: 0.9863
Epoch 4/12
469/469 [==============================] - 56s 115ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0189 - accuracy: 0.9940 - val_loss: 0.0387 - val_accuracy: 0.9876
Epoch 5/12
469/469 [==============================] - 56s 114ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0142 - accuracy: 0.9953 - val_loss: 0.0354 - val_accuracy: 0.9895
Epoch 6/12
469/469 [==============================] - 57s 117ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0092 - accuracy: 0.9970 - val_loss: 0.0407 - val_accuracy: 0.9881
...
real 11m31.063s
user 16m18.586s
sys 4m3.070s
My results with a Macbook Pro M1, 16Gb of RAM:
2020-11-20 21:18:55.599180: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 21:18:55.599898: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-20 21:18:55.889178: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1508 - accuracy: 0.9560/Users/sergio/repos/tf-test/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9561 - val_loss: 0.0479 - val_accuracy: 0.9851
Epoch 2/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0421 - accuracy: 0.9868 - val_loss: 0.0383 - val_accuracy: 0.9870
Epoch 3/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0262 - accuracy: 0.9916 - val_loss: 0.0407 - val_accuracy: 0.9874
Epoch 4/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944 - val_loss: 0.0353 - val_accuracy: 0.9868
Epoch 5/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0125 - accuracy: 0.9960 - val_loss: 0.0395 - val_accuracy: 0.9885
Epoch 6/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0094 - accuracy: 0.9971 - val_loss: 0.0393 - val_accuracy: 0.9898
Epoch 7/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0095 - accuracy: 0.9968 - val_loss: 0.0421 - val_accuracy: 0.9887
Epoch 8/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0066 - accuracy: 0.9978 - val_loss: 0.0437 - val_accuracy: 0.9892
Epoch 9/12
469/469 [==============================] - 25s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9982 - val_loss: 0.0437 - val_accuracy: 0.9897
Epoch 10/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9984 - val_loss: 0.0510 - val_accuracy: 0.9879
Epoch 11/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9986 - val_loss: 0.0401 - val_accuracy: 0.9912
Epoch 12/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9983 - val_loss: 0.0472 - val_accuracy: 0.9901
One thing to note is that there must be a bottleneck somewhere. I was monitoring the GPU usage in Activity Monitor and it never went above 60%.
@Willian-Zhang Thank you for providing a reproducible test case. We will take a look.
MacBook Pro ,13-inch, 2017, i5, 8GB, intel iris 640
apple compiled tensorflow
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1539 - accuracy: 0.9537/Users/corgi/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 108s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1539 - accuracy: 0.9537 - val_loss: 0.0472 - val_accuracy: 0.9849
Epoch 2/12
469/469 [==============================] - 101s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0406 - accuracy: 0.9875 - val_loss: 0.0408 - val_accuracy: 0.9863
Epoch 3/12
469/469 [==============================] - 98s 201ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0261 - accuracy: 0.9922 - val_loss: 0.0427 - val_accuracy: 0.9873
Epoch 4/12
469/469 [==============================] - 100s 204ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0169 - accuracy: 0.9945 - val_loss: 0.0293 - val_accuracy: 0.9905
Epoch 5/12
469/469 [==============================] - 98s 202ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0120 - accuracy: 0.9963 - val_loss: 0.0332 - val_accuracy: 0.9902
Epoch 6/12
469/469 [==============================] - 98s 201ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0097 - accuracy: 0.9970 - val_loss: 0.0361 - val_accuracy: 0.9898
Epoch 7/12
469/469 [==============================] - 99s 203ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0088 - accuracy: 0.9971 - val_loss: 0.0409 - val_accuracy: 0.9880
Epoch 8/12
469/469 [==============================] - 99s 202ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9983 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 9/12
469/469 [==============================] - 97s 200ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0411 - val_accuracy: 0.9888
Epoch 10/12
469/469 [==============================] - 99s 203ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0493 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 101s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9983 - val_loss: 0.0446 - val_accuracy: 0.9892
Epoch 12/12
469/469 [==============================] - 100s 205ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9985 - val_loss: 0.0440 - val_accuracy: 0.9891
pip version (tf 2.3.1)
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1640 - accuracy: 0.9506WARNING:tensorflow:From /Users/corgi/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /Users/corgi/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
469/469 [==============================] - 67s 143ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1640 - accuracy: 0.9506 - val_loss: 0.0571 - val_accuracy: 0.9810
Epoch 2/12
469/469 [==============================] - 63s 134ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0431 - accuracy: 0.9868 - val_loss: 0.0397 - val_accuracy: 0.9864
Epoch 3/12
469/469 [==============================] - 57s 122ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0266 - accuracy: 0.9916 - val_loss: 0.0361 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 57s 122ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0186 - accuracy: 0.9940 - val_loss: 0.0351 - val_accuracy: 0.9895
Epoch 5/12
469/469 [==============================] - 56s 120ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0130 - accuracy: 0.9959 - val_loss: 0.0396 - val_accuracy: 0.9886
Epoch 6/12
469/469 [==============================] - 57s 121ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0097 - accuracy: 0.9967 - val_loss: 0.0392 - val_accuracy: 0.9880
Epoch 7/12
469/469 [==============================] - 59s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0083 - accuracy: 0.9970 - val_loss: 0.0376 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 59s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9978 - val_loss: 0.0423 - val_accuracy: 0.9880
Epoch 9/12
469/469 [==============================] - 56s 119ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9982 - val_loss: 0.0357 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 57s 121ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9981 - val_loss: 0.0378 - val_accuracy: 0.9902
Epoch 11/12
469/469 [==============================] - 56s 119ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0029 - accuracy: 0.9990 - val_loss: 0.0383 - val_accuracy: 0.9910
Epoch 12/12
469/469 [==============================] - 58s 124ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9985 - val_loss: 0.0435 - val_accuracy: 0.9903
tf compiled with FMA, AVX, AVX2, SSE4.1, SSE4.2 flag Wheel from (https://github.com/lakshayg/tensorflow-build)
Epoch 1/12
469/469 [==============================] - ETA: 0s - loss: 0.1570 - accuracy: 0.95272020-11-21 01:23:29.984485: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 64s 135ms/step - loss: 0.1570 - accuracy: 0.9527 - val_loss: 0.0511 - val_accuracy: 0.9836
Epoch 2/12
469/469 [==============================] - ETA: 0s - loss: 0.0425 - accuracy: 0.98662020-11-21 01:24:41.347821: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 67s 142ms/step - loss: 0.0425 - accuracy: 0.9866 - val_loss: 0.0405 - val_accuracy: 0.9867
Epoch 3/12
469/469 [==============================] - ETA: 0s - loss: 0.0274 - accuracy: 0.99152020-11-21 01:25:55.016136: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 68s 145ms/step - loss: 0.0274 - accuracy: 0.9915 - val_loss: 0.0339 - val_accuracy: 0.9886
...
Epoch 11/12
469/469 [==============================] - ETA: 0s - loss: 0.0034 - accuracy: 0.99892020-11-21 01:34:52.652276: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 64s 137ms/step - loss: 0.0034 - accuracy: 0.9989 - val_loss: 0.0429 - val_accuracy: 0.9910
Epoch 12/12
469/469 [==============================] - 61s 129ms/step - loss: 0.0034 - accuracy: 0.9988 - val_loss: 0.0515 - val_accuracy: 0.9893
It's interesting to see apple's optimized version of tensorflow is slower than the pip version. Looking at the warming
I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA
, I think it either has to do with intel's oneapi or the support for instruction sets on x86 that leads to its performance loss. I tried using the compiled binaries that support FMA, AVX, AVX2, SSE4.1, SSE4.2 to see if it is the instruction support that leads to the performance loss, but it throws a warning (due to exhausted data; why only this run?; Batch size -> 118?). Anyhow, it'd be nice if apple provides more documentation about their own version of tf and please let me know if I am the only one who encountered that tensorflow-macos is slower than pip tensorflow (-> request for documentation / request for feature (instruction set) support?).
Running Apple's Mac-optimized on a 2019 16' MacBook Pro with AMD Radeon Pro 5500M:
And here is the GPU performance after the first epochs have started.
I suspect the slack in the GPU is due to the comparatively low batch size compared to the GPU memory capacity. When I change batch_size = 500
, the results are as follows:
With the following GPU usage:
Note that each epoch now takes 27s, less than half of the speed with batch_size=128
. I think this illustrates that each combination of backend + GPU + specific data at hand has a value of batch size that will optimize speed; it's up for the analyst to find it (maybe running one-epoch only iterations to check speed at different settings).
To echo @dkgaraujo, I can run this at around 24s per epoch on a Macbook Pro 16" 2019 with Radeon Pro 5300M if I increase the batch size (e.g., batch_size = 1250
). This is about 10s quicker per epoch compared to CPU and comparable to the M1 benchmarks posted above.
With low batch sizes (e.g. 128
), GPU performance is comparable or slower vs CPU.
@anhornsby with batch_size = 1250
(Train on 48 steps, validate on 8 steps)
on MacBook Air 2020 m1 8G, I get:
on Mac mini 2020 m1 16G:
Results on my Mac Mini 2020 m1 16G.
GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)
Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.
python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881
Highlights are:
15s/epoch 33ms/step (original batch size) 98.8% final accuracy
To echo @dkgaraujo, I can run this at around 24s per epoch on a Macbook Pro 16" 2019 with Radeon Pro 5300M if I increase the batch size (e.g.,
batch_size = 1250
). This is about 10s quicker per epoch compared to CPU and comparable to the M1 benchmarks posted above.With low batch sizes (e.g.
128
), GPU performance is comparable or slower vs CPU.
Some more results:
I ran the same code as before, but with batch_size = 2000
. With the GPU I had 20s/epoch, compared to the CPU with 85s/epoch.
Results on my Mac Mini 2020 m1 16G.
GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)
Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.
python3 cnn.py Epoch 1/12 2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz 469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834 Epoch 2/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868 Epoch 3/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875 Epoch 4/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889 Epoch 5/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890 Epoch 6/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891 Epoch 7/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886 Epoch 8/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911 Epoch 9/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890 Epoch 10/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882 Epoch 11/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891 Epoch 12/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881
Highlights are:
15s/epoch 33ms/step (original batch size) 98.8% final accuracy
Commenting out the line that disables eager execution seems helpful. 20s per epoch with batch_size = 1500
.
Results on my Mac Mini 2020 m1 16G. GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!) Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.
python3 cnn.py Epoch 1/12 2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz 469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834 Epoch 2/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868 Epoch 3/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875 Epoch 4/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889 Epoch 5/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890 Epoch 6/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891 Epoch 7/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886 Epoch 8/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911 Epoch 9/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890 Epoch 10/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882 Epoch 11/12 469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891 Epoch 12/12 469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881
Highlights are: 15s/epoch 33ms/step (original batch size) 98.8% final accuracy
Commenting out the line that disables eager execution seems helpful. 20s per epoch with
batch_size = 1500
.
Interestingly, when I removed the line that disables eager execution my system just ended up hanging? Did you change anything else other than just commenting that out @anhornsby?
@danielmbradley nope, same code as above, using the recommended virtualenv
Macbook Pro M1, 16Gb of RAM standard tf installation with venv, execution from terminal, no other significant processes running
batch size 128: 23s/epoch, 45ms/step, 98.98% final accuracy, GPU% ~ 55% batch size 256: 15s/epoch, 59ms/steep, 99.11% final accuracy, GPU% ~65% batch size 512: 13s/epoch, 98ms/steep, 99.01% final accuracy, GPU% ~75% batch size 1024: 12s/epoch, 180ms/steep, 98.99% final accuracy, GPU% ~80% batch size 1280: 12s/epoch, 227ms/steep, 98.86% final accuracy, GPU% ~83% batch size 2048: 13s/epoch, 375ms/step, 98.76% final accuracy, GPU% ~88% batch size 4096: 15s/epoch, 890ms/step, 98.57% final accuracy, GPU% up to 90%
@anhornsby Interesting, there must be some difference in the way they implemented eager execution between intel Macs and M1 Macs, mine just completely falls over when that line is missing. Did find increasing the size of the batches significantly increased processing speed though (oddly though the time printed in the terminal was wrong once it hit 22 seconds)
Just for fun I wanted to try running this on a Windows 10 Laptop with a mobile 1060 (6G) and i7-7700HQ, 16GB RAM:
batch_size = 128
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1642 - accuracy: 0.9517 - val_loss: 0.0566 - val_accuracy: 0.9817 Epoch 2/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9865 - val_loss: 0.0368 - val_accuracy: 0.9879 Epoch 3/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0281 - accuracy: 0.9908 - val_loss: 0.0357 - val_accuracy: 0.9880 Epoch 4/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0179 - accuracy: 0.9941 - val_loss: 0.0335 - val_accuracy: 0.9893 Epoch 5/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0133 - accuracy: 0.9956 - val_loss: 0.0405 - val_accuracy: 0.9878 Epoch 6/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0095 - accuracy: 0.9968 - val_loss: 0.0305 - val_accuracy: 0.9912 Epoch 7/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9973 - val_loss: 0.0373 - val_accuracy: 0.9896 Epoch 8/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9972 - val_loss: 0.0443 - val_accuracy: 0.9877 Epoch 9/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9982 - val_loss: 0.0397 - val_accuracy: 0.9894 Epoch 10/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0487 - val_accuracy: 0.9885 Epoch 11/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9985 - val_loss: 0.0502 - val_accuracy: 0.9866 Epoch 12/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9984 - val_loss: 0.0426 - val_accuracy: 0.9896
Highlights are: 5s/epoch 11ms/step (original batch size) 98.96% final accuracy
batch_size = 1250
48/48 [==============================] - 4s 78ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.5046 - accuracy: 0.8650 - val_loss: 0.1678 - val_accuracy: 0.9517 Epoch 2/12 48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.1180 - accuracy: 0.9659 - val_loss: 0.0735 - val_accuracy: 0.9778 Epoch 3/12 48/48 [==============================] - 4s 76ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0654 - accuracy: 0.9811 - val_loss: 0.0520 - val_accuracy: 0.9828 Epoch 4/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0463 - accuracy: 0.9866 - val_loss: 0.0465 - val_accuracy: 0.9847 Epoch 5/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0388 - accuracy: 0.9882 - val_loss: 0.0448 - val_accuracy: 0.9852 Epoch 6/12 48/48 [==============================] - 4s 76ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0324 - accuracy: 0.9905 - val_loss: 0.0399 - val_accuracy: 0.9868 Epoch 7/12 48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0257 - accuracy: 0.9924 - val_loss: 0.0373 - val_accuracy: 0.9885 Epoch 8/12 48/48 [==============================] - 4s 78ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0209 - accuracy: 0.9942 - val_loss: 0.0387 - val_accuracy: 0.9882 Epoch 9/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0174 - accuracy: 0.9950 - val_loss: 0.0368 - val_accuracy: 0.9883 Epoch 10/12 48/48 [==============================] - 4s 77ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0152 - accuracy: 0.9955 - val_loss: 0.0379 - val_accuracy: 0.9887 Epoch 11/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0124 - accuracy: 0.9964 - val_loss: 0.0397 - val_accuracy: 0.9880 Epoch 12/12 48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0096 - accuracy: 0.9974 - val_loss: 0.0394 - val_accuracy: 0.9885
Highlights are: 3-4s/epoch 71-78ms/step 98.85% final accuracy
batch_size = 4096
Highlights are: 3s/epoch 200ms/step 98.58% final accuracy
MacBook Air 2020 M1 with 16 GB - Same as others with an M1 MacBook
2020-11-24 21:24:52.855304: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-24 21:24:52.856412: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-24 21:24:53.156975: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1565 - accuracy: 0.9534/Users/spacemonkey/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 26s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1563 - accuracy: 0.9535 - val_loss: 0.0468 - val_accuracy: 0.9847
Epoch 2/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0437 - accuracy: 0.9865 - val_loss: 0.0381 - val_accuracy: 0.9871
Epoch 3/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0277 - accuracy: 0.9912 - val_loss: 0.0390 - val_accuracy: 0.9879
Epoch 4/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0174 - accuracy: 0.9947 - val_loss: 0.0370 - val_accuracy: 0.9865
Epoch 5/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0123 - accuracy: 0.9961 - val_loss: 0.0399 - val_accuracy: 0.9873
Epoch 6/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0099 - accuracy: 0.9966 - val_loss: 0.0379 - val_accuracy: 0.9889
Epoch 7/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0086 - accuracy: 0.9971 - val_loss: 0.0417 - val_accuracy: 0.9878
Epoch 8/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9980 - val_loss: 0.0412 - val_accuracy: 0.9892
Epoch 9/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9984 - val_loss: 0.0411 - val_accuracy: 0.9904
Epoch 10/12
469/469 [==============================] - 25s 50ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9983 - val_loss: 0.0559 - val_accuracy: 0.9868
Epoch 11/12
469/469 [==============================] - 24s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9988 - val_loss: 0.0417 - val_accuracy: 0.9897
Epoch 12/12
469/469 [==============================] - 25s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0448 - val_accuracy: 0.9893
Windows GeForce GTX 1080TI, intel 5820k using tensorflow-gpu version 2.3.1
I had to comment out these lines:
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
Results: Batch size = 128 2s/Batch 5ms/step val_accuracy: 0.9870
Log:
2020-11-25 00:22:53.068167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:53.076410: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:53.093385: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:53.110747: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:53.119427: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:53.139974: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:53.149363: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:53.185160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:53.188810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:53.192451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:53.202681: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:53.208791: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:53.212913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:53.218933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:53.223650: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:53.229800: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:53.233966: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:53.239978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:53.907505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 00:22:53.911316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-11-25 00:22:53.914206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-11-25 00:22:53.919438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-11-25 00:22:53.930196: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14efa539c30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 00:22:53.935222: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-25 00:22:54.154381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:54.162717: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:54.169072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:54.173108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:54.179071: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:54.183115: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:54.189077: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:54.193209: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:54.199286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:54.202657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 00:22:54.208740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-11-25 00:22:54.211523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-11-25 00:22:54.214418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
Train on 469 steps, validate on 79 steps
Epoch 1/12
2020-11-25 00:22:55.528873: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:57.078128: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:57.840244: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1612 - accuracy: 0.9516WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1612 - accuracy: 0.9516 - val_loss: 0.0503 - val_accuracy: 0.9850
Epoch 2/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0382 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0295 - accuracy: 0.9905 - val_loss: 0.0416 - val_accuracy: 0.9851
Epoch 4/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0205 - accuracy: 0.9932 - val_loss: 0.0342 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0138 - accuracy: 0.9955 - val_loss: 0.0373 - val_accuracy: 0.9885
Epoch 6/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0103 - accuracy: 0.9967 - val_loss: 0.0395 - val_accuracy: 0.9881
Epoch 7/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9970 - val_loss: 0.0372 - val_accuracy: 0.9887
Epoch 8/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0072 - accuracy: 0.9977 - val_loss: 0.0389 - val_accuracy: 0.9897
Epoch 9/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0066 - accuracy: 0.9980 - val_loss: 0.0419 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9984 - val_loss: 0.0439 - val_accuracy: 0.9891
Epoch 11/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9989 - val_loss: 0.0503 - val_accuracy: 0.9889
Epoch 12/12
469/469 [==============================] - 2s 4ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9984 - val_loss: 0.0605 - val_accuracy: 0.9870
I ran again with batch size = 512 since I have a lot of memory on this GPU.
Results: 1s/Batch 12ms/step val_accuracy: 0.9905
Log:
2020-11-25 10:32:51.405457: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.188682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-11-25 10:32:54.219663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:54.219952: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.224344: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:54.228633: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:54.230251: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:54.235019: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:54.237543: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:54.247529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:54.247745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:54.248102: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-25 10:32:54.257534: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x17097fc7d50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:32:54.257730: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-25 10:32:54.258020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:54.258307: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.258451: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:54.258593: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:54.258735: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:54.258879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:54.259021: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:54.259161: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:54.259371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:54.873392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:32:54.873552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-11-25 10:32:54.873646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-11-25 10:32:54.873964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-11-25 10:32:54.876885: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x170bb22d130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:32:54.877077: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-25 10:32:55.083716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:55.084009: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:55.084150: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:55.084287: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:55.084425: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:55.084567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:55.084708: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:55.084846: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:55.085016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:55.085171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:32:55.085319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-11-25 10:32:55.085408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-11-25 10:32:55.085599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
Train on 118 steps, validate on 20 steps
Epoch 1/12
2020-11-25 10:32:56.377688: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:57.797291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:58.516367: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
118/118 [==============================] - ETA: 0s - batch: 58.5000 - size: 1.0000 - loss: 0.3163 - accuracy: 0.9059WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
118/118 [==============================] - 2s 14ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.3163 - accuracy: 0.9059 - val_loss: 0.0891 - val_accuracy: 0.9738
Epoch 2/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0695 - accuracy: 0.9798 - val_loss: 0.0621 - val_accuracy: 0.9800
Epoch 3/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0455 - accuracy: 0.9863 - val_loss: 0.0431 - val_accuracy: 0.9861
Epoch 4/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0331 - accuracy: 0.9897 - val_loss: 0.0386 - val_accuracy: 0.9876
Epoch 5/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0259 - accuracy: 0.9922 - val_loss: 0.0322 - val_accuracy: 0.9890
Epoch 6/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0203 - accuracy: 0.9937 - val_loss: 0.0329 - val_accuracy: 0.9895
Epoch 7/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0165 - accuracy: 0.9952 - val_loss: 0.0364 - val_accuracy: 0.9880
Epoch 8/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0126 - accuracy: 0.9960 - val_loss: 0.0303 - val_accuracy: 0.9909
Epoch 9/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0089 - accuracy: 0.9976 - val_loss: 0.0364 - val_accuracy: 0.9893
Epoch 10/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0082 - accuracy: 0.9976 - val_loss: 0.0357 - val_accuracy: 0.9900
Epoch 11/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9987 - val_loss: 0.0395 - val_accuracy: 0.9892
Epoch 12/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0377 - val_accuracy: 0.9905
Question please:
Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.
tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3
batch | s / epoch | ms / step | acc. | gpu-util (%) |
---|---|---|---|---|
128 | 2 | 3-4 | 0.9884 | 73-75 |
256 | 1 | 5-6 | 0.9881 | 82 |
512 | 1 | 10-11 | 0.9881 | 87 |
1024 | 1 | 19-20 | 0.9889 | 92 |
1280 | 1 | 24-30 | 0.9880 | 94 |
2048 | 1 | 37-40 | 0.9883 | 95 |
4096 | 9->1 | 620->65 | 0.9872 | 97 |
batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)
Tested on a MacBook Pro (13-inch, M1, 2020) with 8 GB RAM
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9523/Users/sidagrawal/MachineLearning/env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 106s 220ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9523 - val_loss: 0.0538 - val_accuracy: 0.9827
Epoch 2/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9863 - val_loss: 0.0388 - val_accuracy: 0.9874
Epoch 3/12
469/469 [==============================] - 103s 217ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0271 - accuracy: 0.9917 - val_loss: 0.0362 - val_accuracy: 0.9879
Epoch 4/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0170 - accuracy: 0.9950 - val_loss: 0.0300 - val_accuracy: 0.9897
Epoch 5/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0133 - accuracy: 0.9959 - val_loss: 0.0369 - val_accuracy: 0.9892
Epoch 6/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0089 - accuracy: 0.9971 - val_loss: 0.0393 - val_accuracy: 0.9890
Epoch 7/12
469/469 [==============================] - 105s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0068 - accuracy: 0.9977 - val_loss: 0.0474 - val_accuracy: 0.9867
Epoch 8/12
469/469 [==============================] - 105s 221ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9977 - val_loss: 0.0374 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9983 - val_loss: 0.0376 - val_accuracy: 0.9898
Epoch 10/12
469/469 [==============================] - 103s 216ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0493 - val_accuracy: 0.9888
Epoch 11/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0389 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 105s 220ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9990 - val_loss: 0.0424 - val_accuracy: 0.9904
No changes to the script: 105 s/epoch 220ms/step 99.04% final acc
Not sure why my numbers aren't comparable to the other M1 numbers.
tested on ubuntu 20.04.1, rtx 3070, tensorflow container
20.11-tf2-py3
batch s / epoch ms / step acc. gpu-util (%) 128 2 3-4 0.9884 73-75 256 1 5-6 0.9881 82 512 1 10-11 0.9881 87 1024 1 19-20 0.9889 92 1280 1 24-30 0.9880 94 2048 1 37-40 0.9883 95 4096 9->1 620->65 0.9872 97batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)
3070 running Tensorflow, how did you do it? I thought you needed CUDA 11 on a 3070 and that there were problems with CUDA 11 and the nightly. I guess the difference is Windows vs Ubuntu.
One thing I hope is that with support for Apple ComputeML, as the M line of apple chips evolves, this fork "just works" with faster/better Apple Silicon going forward, rather than needing an endless series of patches/etc. The CUDA/CUDNN install dance on Windows never fails to thwart me.
Question please:
Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.
I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.
Question please: Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.
I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.
Oh, I didn't think of that. Do you have any source on this?
Question please: Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.
I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.
Oh, I didn't think of that. Do you have any source on this?
I'm not sure how true that is? I've never had any issue with speed when making predictions on non-ML specific hardware, it's always been the training that's been slow
Question please: Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.
I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.
Oh, I didn't think of that. Do you have any source on this?
I'm not sure how true that is? I've never had any issue with speed when making predictions on non-ML specific hardware, it's always been the training that's been slow
Information isn't great on the Neural Engine. CoreML is definitely a way to run trained models on device. This repo talks about what we know about the neural engine.
The impressive speedup of the super resolution scaling in Pixelmator 2 cites the Neural Engine as helping on M1 Macs
It's notable that the writeup on this branch of Tensorflow talks about using ML Compute to enhance the training speed by using the CPU and GPU, but doesn't mention the Neural Engine itself. It would be great if we could use it to train! Perhaps that's coming some day?
Macbook pro RAM 16 GB, HD 500 GB, same script but no disable eager execution
Epoch 1/12 2020-11-27 00:02:50.544598: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2020-11-27 00:02:50.545510: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz 469/469 [==============================] - 18s 35ms/step - loss: 0.3663 - accuracy: 0.8887 - val_loss: 0.0470 - val_accuracy: 0.9846 Epoch 2/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0449 - accuracy: 0.9865 - val_loss: 0.0438 - val_accuracy: 0.9844 Epoch 3/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss: 0.0314 - val_accuracy: 0.9885 Epoch 4/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0177 - accuracy: 0.9949 - val_loss: 0.0361 - val_accuracy: 0.9884 Epoch 5/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0108 - accuracy: 0.9965 - val_loss: 0.0310 - val_accuracy: 0.9903 Epoch 6/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0081 - accuracy: 0.9976 - val_loss: 0.0311 - val_accuracy: 0.9905 Epoch 7/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0069 - accuracy: 0.9977 - val_loss: 0.0441 - val_accuracy: 0.9880 Epoch 8/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0051 - accuracy: 0.9982 - val_loss: 0.0352 - val_accuracy: 0.9902 Epoch 9/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0371 - val_accuracy: 0.9901 Epoch 10/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0035 - accuracy: 0.9987 - val_loss: 0.0349 - val_accuracy: 0.9905 Epoch 11/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0035 - accuracy: 0.9990 - val_loss: 0.0381 - val_accuracy: 0.9895 Epoch 12/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0044 - accuracy: 0.9987 - val_loss: 0.0401 - val_accuracy: 0.9901
tested on ubuntu 20.04.1, rtx 3070, tensorflow container
20.11-tf2-py3
batch s / epoch ms / step acc. gpu-util (%)
128 2 3-4 0.9884 73-75
256 1 5-6 0.9881 82
512 1 10-11 0.9881 87
1024 1 19-20 0.9889 92
1280 1 24-30 0.9880 94
2048 1 37-40 0.9883 95
4096 9->1 620->65 0.9872 97
batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)
3070 running Tensorflow, how did you do it? I thought you needed CUDA 11 on a 3070 and that there were problems with CUDA 11 and the nightly. I guess the difference is Windows vs Ubuntu.
Just install CUDA 11.1 compatible driver(455 for now) and use aforementioned container. Container takes care of troublesome dependency problems. Check this for detail.
Tested on MacBook Air (13-inch, Early 2015, 1.6GHz Intel Core i5, Intel HD Graphics 6000) with 8GB RAM
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9540/Users/user/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 222s 461ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9540 - val_loss: 0.0448 - val_accuracy: 0.9861
Epoch 2/12
469/469 [==============================] - 231s 482ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0439 - accuracy: 0.9866 - val_loss: 0.0357 - val_accuracy: 0.9876
Epoch 3/12
469/469 [==============================] - 241s 503ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0265 - accuracy: 0.9915 - val_loss: 0.0342 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 277s 576ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0191 - accuracy: 0.9942 - val_loss: 0.0307 - val_accuracy: 0.9893
Epoch 5/12
469/469 [==============================] - 248s 512ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0117 - accuracy: 0.9964 - val_loss: 0.0329 - val_accuracy: 0.9897
Epoch 6/12
469/469 [==============================] - 230s 478ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0107 - accuracy: 0.9966 - val_loss: 0.0353 - val_accuracy: 0.9888
Epoch 7/12
469/469 [==============================] - 232s 482ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9973 - val_loss: 0.0533 - val_accuracy: 0.9864
Epoch 8/12
469/469 [==============================] - 268s 561ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9979 - val_loss: 0.0429 - val_accuracy: 0.9885
Epoch 9/12
469/469 [==============================] - 235s 485ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9982 - val_loss: 0.0363 - val_accuracy: 0.9899
Epoch 10/12
469/469 [==============================] - 253s 528ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9982 - val_loss: 0.0348 - val_accuracy: 0.9909
Epoch 11/12
469/469 [==============================] - 248s 507ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9984 - val_loss: 0.0405 - val_accuracy: 0.9905
Epoch 12/12
469/469 [==============================] - 248s 515ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0123 - accuracy: 0.9960 - val_loss: 0.0381 - val_accuracy: 0.9886
No changes to the script: 248 s/epoch 515ms/step 98.86% final acc
MacBook Air 2020 M1 with 8 GB- Connected to Power - No difference really to the others with M1
2020-11-27 15:16:01.395210: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-27 15:16:01.398078: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-27 15:16:01.702008: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9520/Users/savathos/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 25s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1598 - accuracy: 0.9520 - val_loss: 0.0498 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0424 - accuracy: 0.9868 - val_loss: 0.0392 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0270 - accuracy: 0.9918 - val_loss: 0.0382 - val_accuracy: 0.9872
Epoch 4/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944 - val_loss: 0.0397 - val_accuracy: 0.9879
Epoch 5/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0116 - accuracy: 0.9962 - val_loss: 0.0449 - val_accuracy: 0.9870
Epoch 6/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9968 - val_loss: 0.0383 - val_accuracy: 0.9885
Epoch 7/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0068 - accuracy: 0.9979 - val_loss: 0.0441 - val_accuracy: 0.9865
Epoch 8/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0073 - accuracy: 0.9976 - val_loss: 0.0529 - val_accuracy: 0.9869
Epoch 9/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980 - val_loss: 0.0451 - val_accuracy: 0.9884
Epoch 10/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0042 - accuracy: 0.9987 - val_loss: 0.0542 - val_accuracy: 0.9874
Epoch 11/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9984 - val_loss: 0.0505 - val_accuracy: 0.9877
Epoch 12/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0492 - val_accuracy: 0.9871
No changes to the script: 24 s/epoch 47ms/step 99.89% accuracy
Device: MacBook Pro (13-inch, 2019), 2.4 GHz Quad-Core Intel Core i5, 8GB RAM, Radeon RX 5700 XT 8 GB
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - 63s 128ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1607 - accuracy: 0.9530 - val_loss: 0.0528 - val_accuracy: 0.9827
Epoch 2/12
469/469 [==============================] - 62s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0439 - accuracy: 0.9863 - val_loss: 0.0375 - val_accuracy: 0.9874
Epoch 3/12
469/469 [==============================] - 62s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0257 - accuracy: 0.9917 - val_loss: 0.0369 - val_accuracy: 0.9881
Epoch 4/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0188 - accuracy: 0.9937 - val_loss: 0.0327 - val_accuracy: 0.9899
Epoch 5/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0116 - accuracy: 0.9964 - val_loss: 0.0441 - val_accuracy: 0.9864
Epoch 6/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0092 - accuracy: 0.9970 - val_loss: 0.0341 - val_accuracy: 0.9903
Epoch 7/12
469/469 [==============================] - 61s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0338 - val_accuracy: 0.9897
Epoch 8/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9979 - val_loss: 0.0392 - val_accuracy: 0.9888
Epoch 9/12
469/469 [==============================] - 61s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980 - val_loss: 0.0404 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9985 - val_loss: 0.0464 - val_accuracy: 0.9887
Epoch 11/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0473 - val_accuracy: 0.9890
Epoch 12/12
469/469 [==============================] - 63s 128ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0032 - accuracy: 0.9988 - val_loss: 0.0453 - val_accuracy: 0.9897
Summary:
Desktop Ryzen 2400g, 16GB, Windows (Conda) ~Worth the try~
2020-11-27 16:43:00.273670: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Epoch 1/12
469/Unknown - 62s 132ms/step - loss: 0.1622 - accuracy: 0.95162020-11-27 16:44:06.022659: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
2020-11-27 16:44:09.471968: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 66s 140ms/step - loss: 0.1622 - accuracy: 0.9516 - val_loss: 0.0600 - val_accuracy: 0.9799
Epoch 2/12
468/469 [============================>.] - ETA: 0s - loss: 0.0428 - accuracy: 0.98692020-11-27 16:45:17.461835: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 68s 145ms/step - loss: 0.0429 - accuracy: 0.9869 - val_loss: 0.0379 - val_accuracy: 0.9882
Epoch 3/12
468/469 [============================>.] - ETA: 0s - loss: 0.0276 - accuracy: 0.99152020-11-27 16:46:21.553304: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0277 - accuracy: 0.9915 - val_loss: 0.0349 - val_accuracy: 0.9882
Epoch 4/12
468/469 [============================>.] - ETA: 0s - loss: 0.0183 - accuracy: 0.99452020-11-27 16:47:25.641510: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0183 - accuracy: 0.9945 - val_loss: 0.0359 - val_accuracy: 0.9894
Epoch 5/12
468/469 [============================>.] - ETA: 0s - loss: 0.0146 - accuracy: 0.99512020-11-27 16:48:29.695354: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0146 - accuracy: 0.9951 - val_loss: 0.0367 - val_accuracy: 0.9890
Epoch 6/12
468/469 [============================>.] - ETA: 0s - loss: 0.0089 - accuracy: 0.99702020-11-27 16:49:33.919164: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0088 - accuracy: 0.9970 - val_loss: 0.0360 - val_accuracy: 0.9895
Epoch 7/12
468/469 [============================>.] - ETA: 0s - loss: 0.0084 - accuracy: 0.99752020-11-27 16:50:38.218212: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0084 - accuracy: 0.9975 - val_loss: 0.0499 - val_accuracy: 0.9873
Epoch 8/12
468/469 [============================>.] - ETA: 0s - loss: 0.0066 - accuracy: 0.99792020-11-27 16:51:42.458833: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0066 - accuracy: 0.9979 - val_loss: 0.0402 - val_accuracy: 0.9896
Epoch 9/12
468/469 [============================>.] - ETA: 0s - loss: 0.0067 - accuracy: 0.99762020-11-27 16:52:46.661109: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0067 - accuracy: 0.9976 - val_loss: 0.0412 - val_accuracy: 0.9893
Epoch 10/12
468/469 [============================>.] - ETA: 0s - loss: 0.0041 - accuracy: 0.99872020-11-27 16:53:52.020888: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 65s 139ms/step - loss: 0.0041 - accuracy: 0.9987 - val_loss: 0.0374 - val_accuracy: 0.9901
Epoch 11/12
468/469 [============================>.] - ETA: 0s - loss: 0.0034 - accuracy: 0.99892020-11-27 16:54:55.984763: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 136ms/step - loss: 0.0034 - accuracy: 0.9989 - val_loss: 0.0458 - val_accuracy: 0.9904
Epoch 12/12
468/469 [============================>.] - ETA: 0s - loss: 0.0035 - accuracy: 0.99892020-11-27 16:55:59.786269: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 136ms/step - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0515 - val_accuracy: 0.9876
Device: Mac Pro Late 2013 (3.7 GHz Quad-Core Intel Xeon E5, 2x AMD FirePro D300 2 GB, 64GB). Looks like neither of the GPUs are being used here - Max GPU utilization is ~6% and CPU Idle is ~60%.
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9538
469/469 [==============================] - 170s 355ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9538 - val_loss: 0.0556 - val_accuracy: 0.9806
Epoch 2/12
469/469 [==============================] - 172s 361ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0437 - accuracy: 0.9866 - val_loss: 0.0365 - val_accuracy: 0.9881
Epoch 3/12
469/469 [==============================] - 185s 389ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0269 - accuracy: 0.9916 - val_loss: 0.0356 - val_accuracy: 0.9887
Epoch 4/12
469/469 [==============================] - 182s 383ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9946 - val_loss: 0.0375 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 171s 359ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0131 - accuracy: 0.9959 - val_loss: 0.0405 - val_accuracy: 0.9883
Epoch 6/12
469/469 [==============================] - 171s 358ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9968 - val_loss: 0.0355 - val_accuracy: 0.9899
Epoch 7/12
469/469 [==============================] - 171s 358ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9977 - val_loss: 0.0387 - val_accuracy: 0.9892
Epoch 8/12
469/469 [==============================] - 170s 355ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9981 - val_loss: 0.0394 - val_accuracy: 0.9897
Epoch 9/12
469/469 [==============================] - 172s 361ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9981 - val_loss: 0.0404 - val_accuracy: 0.9902
Epoch 10/12
469/469 [==============================] - 169s 354ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9982 - val_loss: 0.0481 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 169s 354ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9980 - val_loss: 0.0403 - val_accuracy: 0.9892
Epoch 12/12
469/469 [==============================] - 166s 348ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0036 - accuracy: 0.9990 - val_loss: 0.0532 - val_accuracy: 0.9883
In the screenshot below, GPU slot 2 is connected to the display and slot 1 is the spare.
Surprisingly, when I ran the code from issue #39 it switched to using the idle GPU with ~80% utilization. Seems like set_mlc_device ignores my GPU recommendation when model size is small.
Mac Pro Late 2013 (3,5 GHz 6-Core Intel Xeon E5, 2x AMD FirePro D500 3 GB, 32GB).
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1605 - accuracy: 0.9520
469/469 [==============================] - 44s 78ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1605 - accuracy: 0.9520 - val_loss: 0.0501 - val_accuracy: 0.9839
Epoch 2/12
469/469 [==============================] - 39s 77ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0460 - accuracy: 0.9859 - val_loss: 0.0373 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0270 - accuracy: 0.9919 - val_loss: 0.0383 - val_accuracy: 0.9866
Epoch 4/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9937 - val_loss: 0.0334 - val_accuracy: 0.9896
Epoch 5/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0138 - accuracy: 0.9955 - val_loss: 0.0409 - val_accuracy: 0.9876
Epoch 6/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0107 - accuracy: 0.9965 - val_loss: 0.0381 - val_accuracy: 0.9886
Epoch 7/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0090 - accuracy: 0.9970 - val_loss: 0.0408 - val_accuracy: 0.9883
Epoch 8/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0363 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0385 - val_accuracy: 0.9908
Epoch 10/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9986 - val_loss: 0.0523 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9983 - val_loss: 0.0537 - val_accuracy: 0.9876
Epoch 12/12
469/469 [==============================] - 38s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9983 - val_loss: 0.0439 - val_accuracy: 0.9893
Can anyone explain how to install TensorFlow on MacBook m1 2020. I am getting error: zsh: illegal hardware instruction python under virtual environment(tensorflow_macos_venv) when I try to import TensorFlow. I am using terminal without Rosette 2.
Thank you @Willian-Zhang for creating this!
I used it (code unchanged from above) to benchmark a few of my Macs + a GPU-powered Google Colab instance:
MacBook Air (M1) | MacBook Pro 13-inch (M1) | MacBook Pro 16-inch (Intel) | Google Colab T4 GPU^ | |
---|---|---|---|---|
tensorflow_macos benchmark |
23-24s/epoch | 25-26s/epoch | 20-21s/epoch | 9s/epoch |
Specs:
MacBook Air (M1) | MacBook Pro 13-inch (M1) | MacBook Pro 16-inch (Intel) | |
---|---|---|---|
CPU | 8-core M1 | 8-core M1 | 2.4GHz 8-core Intel Core i9 |
GPU | 7-core M1 | 8-core M1 | AMD Radeon Pro 5500M with 8GB of GDDR6 memory |
Neural engine | 16-core M1 | 16-core M1 | NA |
Memory (RAM) | 16GB | 16GB | 64GB |
Storage | 256GB | 512GB | 2TB |
Very interesting to see the M1 MacBook Air performing on-par/better than the M1 MacBook Pro.
The 16-inch I used is almost top-spec too (barely a year old)... incredible how performant Apple's new M1 chip is.
I also did a few more tests on each machine, namely:
See the results from the above on my blog. I also made a video running through each of them on YouTube.
i5-8400T 16GB 2400Mhz
just disable 2 this line cause dont have gpu
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1616 - accuracy: 0.9532/Users/thinkmac/opt/miniconda3/envs/tf-test/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 40s 81ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1616 - accuracy: 0.9532 - val_loss: 0.0551 - val_accuracy: 0.9816
Epoch 2/12
469/469 [==============================] - 40s 82ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0440 - accuracy: 0.9864 - val_loss: 0.0459 - val_accuracy: 0.9848
Epoch 3/12
469/469 [==============================] - 37s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0280 - accuracy: 0.9909 - val_loss: 0.0359 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 37s 75ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9937 - val_loss: 0.0332 - val_accuracy: 0.9894
Epoch 5/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0132 - accuracy: 0.9958 - val_loss: 0.0427 - val_accuracy: 0.9872
Epoch 6/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0102 - accuracy: 0.9969 - val_loss: 0.0420 - val_accuracy: 0.9877
Epoch 7/12
469/469 [==============================] - 37s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9974 - val_loss: 0.0525 - val_accuracy: 0.9843
Epoch 8/12
469/469 [==============================] - 38s 75ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9975 - val_loss: 0.0381 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0438 - val_accuracy: 0.9879
Epoch 10/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9975 - val_loss: 0.0470 - val_accuracy: 0.9880
Epoch 11/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0039 - accuracy: 0.9986 - val_loss: 0.0423 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9987 - val_loss: 0.0349 - val_accuracy: 0.9914
Result:
MacBook Pro (16-inch, 2019) CPU: 2.3 GHz 8-Core Intel Core i9 GPU: AMD Radeon Pro 5500M 4 GB
2020-12-28 17:50:35.421277: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-28 17:50:35.544447: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-12-28 17:50:36.201512: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1651 - accuracy: 0.9515/Users/gaspardshen/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1651 - accuracy: 0.9515 - val_loss: 0.0520 - val_accuracy: 0.9835
Epoch 2/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9864 - val_loss: 0.0337 - val_accuracy: 0.9889
Epoch 3/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0275 - accuracy: 0.9918 - val_loss: 0.0360 - val_accuracy: 0.9877
Epoch 4/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0190 - accuracy: 0.9940 - val_loss: 0.0364 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0132 - accuracy: 0.9957 - val_loss: 0.0422 - val_accuracy: 0.9864
Epoch 6/12
469/469 [==============================] - 20s 38ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9965 - val_loss: 0.0375 - val_accuracy: 0.9892
Epoch 7/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0081 - accuracy: 0.9973 - val_loss: 0.0405 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9976 - val_loss: 0.0397 - val_accuracy: 0.9889
Epoch 9/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9984 - val_loss: 0.0492 - val_accuracy: 0.9872
Epoch 10/12
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0069 - accuracy: 0.9975 - val_loss: 0.0365 - val_accuracy: 0.9894
Epoch 11/12
469/469 [==============================] - 20s 38ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9985 - val_loss: 0.0374 - val_accuracy: 0.9907
Epoch 12/12
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0026 - accuracy: 0.9992 - val_loss: 0.0390 - val_accuracy: 0.9909
python cnn_benchmark.py 268.30s user 212.62s system 194% cpu 4:07.11 total
Results:
Tested on a 2.2 GHz Quad-Core Intel Core i7, Intel Iris Pro Graphics, 2014 15-inch MacBook Pro. I also observed that the mac-optimized version seems slower than the non-optimized version. (similar to the results of @rnogy )
MacOS optimized Tensorflow, I put mlcompute.set_mlc_device(device_name='any'
). I had to comment out disable_eager_execution()
, otherwise I would get an error segmentation fault
. Results
2020-12-29 16:49:46.272987: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/12
2020-12-29 16:49:46.886017: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
469/469 [==============================] - 79s 160ms/step - loss: 0.3364 - accuracy: 0.8962 - val_loss: 0.0550 - val_accuracy: 0.9823
Epoch 2/12
469/469 [==============================] - 73s 156ms/step - loss: 0.0420 - accuracy: 0.9871 - val_loss: 0.0393 - val_accuracy: 0.9865
Epoch 3/12
469/469 [==============================] - 78s 166ms/step - loss: 0.0239 - accuracy: 0.9934 - val_loss: 0.0320 - val_accuracy: 0.9896
Epoch 4/12
469/469 [==============================] - 80s 170ms/step - loss: 0.0172 - accuracy: 0.9945 - val_loss: 0.0423 - val_accuracy: 0.9871
Epoch 5/12
469/469 [==============================] - 75s 160ms/step - loss: 0.0112 - accuracy: 0.9967 - val_loss: 0.0421 - val_accuracy: 0.9860
Epoch 6/12
469/469 [==============================] - 75s 159ms/step - loss: 0.0080 - accuracy: 0.9976 - val_loss: 0.0451 - val_accuracy: 0.9878
Epoch 7/12
469/469 [==============================] - 74s 157ms/step - loss: 0.0071 - accuracy: 0.9979 - val_loss: 0.0392 - val_accuracy: 0.9885
Epoch 8/12
469/469 [==============================] - 83s 177ms/step - loss: 0.0069 - accuracy: 0.9976 - val_loss: 0.0433 - val_accuracy: 0.9882
Epoch 9/12
469/469 [==============================] - 78s 166ms/step - loss: 0.0053 - accuracy: 0.9984 - val_loss: 0.0399 - val_accuracy: 0.9907
Epoch 10/12
469/469 [==============================] - 78s 165ms/step - loss: 0.0050 - accuracy: 0.9983 - val_loss: 0.0412 - val_accuracy: 0.9901
Epoch 11/12
469/469 [==============================] - 75s 160ms/step - loss: 0.0035 - accuracy: 0.9990 - val_loss: 0.0461 - val_accuracy: 0.9897
Epoch 12/12
469/469 [==============================] - 75s 160ms/step - loss: 0.0025 - accuracy: 0.9992 - val_loss: 0.0466 - val_accuracy: 0.9889
Non-MacOS optimized Tensorflow (pip install tensorflow
in conda env). Results:
2020-12-29 17:12:34.872512: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-12-29 17:12:34.872759: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-29 17:12:35.023844: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-12-29 17:12:35.676140: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1557 - accuracy: 0.9538/Users/fengxma/opt/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 69s 140ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1557 - accuracy: 0.9538 - val_loss: 0.0425 - val_accuracy: 0.9848
Epoch 2/12
469/469 [==============================] - 68s 138ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0406 - accuracy: 0.9879 - val_loss: 0.0446 - val_accuracy: 0.9869
Epoch 3/12
469/469 [==============================] - 65s 132ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0250 - accuracy: 0.9925 - val_loss: 0.0372 - val_accuracy: 0.9871
Epoch 4/12
469/469 [==============================] - 65s 134ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0176 - accuracy: 0.9944 - val_loss: 0.0325 - val_accuracy: 0.9887
Epoch 5/12
469/469 [==============================] - 63s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0119 - accuracy: 0.9963 - val_loss: 0.0349 - val_accuracy: 0.9901
Epoch 6/12
469/469 [==============================] - 66s 135ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0102 - accuracy: 0.9968 - val_loss: 0.0350 - val_accuracy: 0.9888
Epoch 7/12
469/469 [==============================] - 63s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0069 - accuracy: 0.9978 - val_loss: 0.0375 - val_accuracy: 0.9904
Epoch 8/12
469/469 [==============================] - 64s 130ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0051 - accuracy: 0.9985 - val_loss: 0.0459 - val_accuracy: 0.9871
Epoch 9/12
469/469 [==============================] - 64s 131ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0072 - accuracy: 0.9975 - val_loss: 0.0347 - val_accuracy: 0.9905
Epoch 10/12
469/469 [==============================] - 66s 135ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9979 - val_loss: 0.0439 - val_accuracy: 0.9881
Epoch 11/12
469/469 [==============================] - 65s 132ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0036 - accuracy: 0.9988 - val_loss: 0.0476 - val_accuracy: 0.9885
Epoch 12/12
469/469 [==============================] - 64s 131ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0029 - accuracy: 0.9990 - val_loss: 0.0448 - val_accuracy: 0.9897
2020-12-30 14:50:04.896932: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-12-30 14:50:05.037206: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes) 2020-12-30 14:50:06.878061: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) Train on 469 steps, validate on 79 steps Epoch 1/12 468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1611 - accuracy: 0.9521 469/469 [==============================] - 23s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1610 - accuracy: 0.9521 - val_loss: 0.0496 - val_accuracy: 0.9846 Epoch 2/12 469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0453 - accuracy: 0.9860 - val_loss: 0.0501 - val_accuracy: 0.9833 Epoch 3/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0284 - accuracy: 0.9910 - val_loss: 0.0380 - val_accuracy: 0.9868 Epoch 4/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9942 - val_loss: 0.0343 - val_accuracy: 0.9888 Epoch 5/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0318 - val_accuracy: 0.9904 Epoch 6/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0104 - accuracy: 0.9967 - val_loss: 0.0337 - val_accuracy: 0.9896 Epoch 7/12 469/469 [==============================] - 22s 42ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0080 - accuracy: 0.9974 - val_loss: 0.0363 - val_accuracy: 0.9895 Epoch 8/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0074 - accuracy: 0.9973 - val_loss: 0.0470 - val_accuracy: 0.9878 Epoch 9/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9976 - val_loss: 0.0436 - val_accuracy: 0.9887 Epoch 10/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9982 - val_loss: 0.0492 - val_accuracy: 0.9881 Epoch 11/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9983 - val_loss: 0.0429 - val_accuracy: 0.9896 Epoch 12/12 469/469 [==============================] - 22s 43ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9989 - val_loss: 0.0454 - val_accuracy: 0.9893
Had quite some fan noise
GPU name: Tesla T4 16GB vRam CPU: Intel(R) Xeon(R) CPU @ 2.20GHz RAM: 16 GB Precision: Float 32
Epoch 12/12 469/469 [==============================] - 8s 8ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0038 - accuracy: 0.9987 - val_loss: 0.0305 - val_accuracy: 0.9919 CPU times: user 2min, sys: 54.6 s, total: 2min 55s Wall time: 2min 2s
GPU name: Tesla T4 16GB vRam CPU: Intel(R) Xeon(R) CPU @ 2.20GHz RAM: 16 GB Precision: Float 16
Epoch 12/12 469/469 [==============================] - 9s 8ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9982 - val_loss: 0.0422 - val_accuracy: 0.9894 CPU times: user 2min 5s, sys: 55.8 s, total: 3min 1s Wall time: 2min 5s
Seeing all amazing results you might not wanna bother about this machine from 2016. 😅 Anyways, I got the following information.
System: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) Operating System: macOS Big Sur version 11.1 Processor: 2.9 GHz Dual-Core Intel Core i5 Memory: 8 GB 2133 MHz LPDDR3 Graphics: Intel Iris Graphics 550 1536 MB
2021-01-16 19:13:22.511385: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-16 19:13:23.496719: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 2.2993 - accuracy: 0.1251/Users/rahulbhalley/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 146s 288ms/step - batch: 234.0000 - size: 1.0000 - loss: 2.2993 - accuracy: 0.1251 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 2/12
469/469 [==============================] - 140s 291ms/step - batch: 234.0000 - size: 1.0000 - loss: 1.8151 - accuracy: 0.3670 - val_loss: 0.6209 - val_accuracy: 0.8441
Epoch 3/12
469/469 [==============================] - 140s 289ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.4052 - accuracy: 0.8984 - val_loss: 0.2491 - val_accuracy: 0.9445
Epoch 4/12
469/469 [==============================] - 158s 330ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1970 - accuracy: 0.9510 - val_loss: 0.1449 - val_accuracy: 0.9649
Epoch 5/12
469/469 [==============================] - 145s 301ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1394 - accuracy: 0.9653 - val_loss: 0.1099 - val_accuracy: 0.9695
Epoch 6/12
469/469 [==============================] - 152s 312ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1117 - accuracy: 0.9715 - val_loss: 0.0927 - val_accuracy: 0.9739
Epoch 7/12
469/469 [==============================] - 146s 300ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0933 - accuracy: 0.9766 - val_loss: 0.0828 - val_accuracy: 0.9787
Epoch 8/12
469/469 [==============================] - 180s 374ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0810 - accuracy: 0.9796 - val_loss: 0.0765 - val_accuracy: 0.9793
Epoch 9/12
469/469 [==============================] - 165s 342ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0717 - accuracy: 0.9817 - val_loss: 0.0718 - val_accuracy: 0.9811
Epoch 10/12
469/469 [==============================] - 140s 287ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0630 - accuracy: 0.9845 - val_loss: 0.0586 - val_accuracy: 0.9818
Epoch 11/12
469/469 [==============================] - 229s 480ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0570 - accuracy: 0.9859 - val_loss: 0.0727 - val_accuracy: 0.9817
Epoch 12/12
469/469 [==============================] - 146s 302ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0506 - accuracy: 0.9874 - val_loss: 0.0559 - val_accuracy: 0.9838
Key results:
iMac Pro 2017, 3 GHz 10-Core Intel Xeon W, 32 GB 2666 MHz DDR4, Radeon Pro Vega 64 16 GB
On GPU:
2021-01-23 15:21:50.079691: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:21:50.183928: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:21:52.322549: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1696 - accuracy: 0.9476/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 17s 27ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1696 - accuracy: 0.9476 - val_loss: 0.0472 - val_accuracy: 0.9850
Epoch 2/12
469/469 [==============================] - 14s 27ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0391 - val_accuracy: 0.9874
...
Epoch 11/12
469/469 [==============================] - 15s 28ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0042 - accuracy: 0.9985 - val_loss: 0.0474 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 15s 28ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0051 - accuracy: 0.9983 - val_loss: 0.0446 - val_accuracy: 0.9892
On CPU:
2021-01-23 15:25:55.524865: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:25:55.617573: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:25:56.065950: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1579 - accuracy: 0.9530/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 45s 93ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1579 - accuracy: 0.9530 - val_loss: 0.0545 - val_accuracy: 0.9820
Epoch 2/12
469/469 [==============================] - 45s 93ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0457 - accuracy: 0.9858 - val_loss: 0.0446 - val_accuracy: 0.9856
...
Epoch 11/12
469/469 [==============================] - 47s 96ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9978 - val_loss: 0.0409 - val_accuracy: 0.9894
Epoch 12/12
469/469 [==============================] - 47s 96ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9990 - val_loss: 0.0409 - val_accuracy: 0.9890
On pip-provided Tensorflow 2.4 (with removing two mlcompute lines from the script) it is twice faster:
2021-01-23 15:42:17.869355: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-23 15:42:17.869529: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:42:17.960406: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:42:18.386414: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9546/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv-tf-pip/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9546 - val_loss: 0.0459 - val_accuracy: 0.9854
Epoch 2/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0423 - accuracy: 0.9869 - val_loss: 0.0390 - val_accuracy: 0.9870
...
Epoch 11/12
469/469 [==============================] - 26s 51ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0505 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 25s 50ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9989 - val_loss: 0.0450 - val_accuracy: 0.9900
This code is too hot! I think I just toasted GPU on my 16'' mbp by running this benchmark. Make sure your warranty not expired before experimenting.
MacBook Air (M1, 2020) 7 Core GPU
Train on 469 steps, validate on 79 steps
Epoch 1/12
467/469 [============================>.] - ETA: 0s - batch: 233.0000 - size: 1.0000 - loss: 0.1596 - accuracy: 0.9516/Users/leon/miniforge3/envs/tf-env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: Model.state_updates
will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates
are applied automatically.
warnings.warn('Model.state_updates
will be removed in a future version. '
469/469 [==============================] - 13s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1594 - accuracy: 0.9517 - val_loss: 0.0578 - val_accuracy: 0.9819
Epoch 2/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0430 - accuracy: 0.9871 - val_loss: 0.0362 - val_accuracy: 0.9879
Epoch 3/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0269 - accuracy: 0.9913 - val_loss: 0.0375 - val_accuracy: 0.9870
Epoch 4/12
469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0181 - accuracy: 0.9941 - val_loss: 0.0393 - val_accuracy: 0.9878
Epoch 5/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9956 - val_loss: 0.0347 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0098 - accuracy: 0.9967 - val_loss: 0.0356 - val_accuracy: 0.9890
Epoch 7/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9970 - val_loss: 0.0341 - val_accuracy: 0.9896
Epoch 8/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9984 - val_loss: 0.0402 - val_accuracy: 0.9893
Epoch 9/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9978 - val_loss: 0.0480 - val_accuracy: 0.9884
Epoch 10/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0058 - accuracy: 0.9980 - val_loss: 0.0435 - val_accuracy: 0.9877
Epoch 11/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9986 - val_loss: 0.0410 - val_accuracy: 0.9913
Epoch 12/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9989 - val_loss: 0.0492 - val_accuracy: 0.9889
Process finished with exit code 0
this was one of the factors that helped me choose between 2 laptops that were priced the same 1.)MSI gf65 i7 10th gen with 6GB rtx2060 2.)apple MacBook air m1 base model
I ran the benchmark on both the device at the store and was surprised how capable apple m1 even tho it couldn't beat the MSI but it gave a respected result than the similar priced hp
anyway at last I end up buying MSI as it gave me more options
so here are my results: specs:i7 10th gen GPU:RTX 2060(6GB) it only utilized 40% only
Epoch 1/12 469/469 [==============================] - 7s 9ms/step - loss: 0.3589 - accuracy: 0.8936 - val_loss: 0.0471 - val_accuracy: 0.9855 Epoch 2/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0429 - accuracy: 0.9871 - val_loss: 0.0355 - val_accuracy: 0.9879 Epoch 3/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0258 - accuracy: 0.9918 - val_loss: 0.0318 - val_accuracy: 0.9894 Epoch 4/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0163 - accuracy: 0.9943 - val_loss: 0.0275 - val_accuracy: 0.9913 Epoch 5/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0117 - accuracy: 0.9962 - val_loss: 0.0349 - val_accuracy: 0.9894 Epoch 6/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0096 - accuracy: 0.9966 - val_loss: 0.0389 - val_accuracy: 0.9883 Epoch 7/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0510 - val_accuracy: 0.9869 Epoch 8/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0081 - accuracy: 0.9971 - val_loss: 0.0389 - val_accuracy: 0.9903 Epoch 9/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0033 - accuracy: 0.9989 - val_loss: 0.0456 - val_accuracy: 0.9895 Epoch 10/12 469/469 [==============================] - 4s 9ms/step - loss: 0.0053 - accuracy: 0.9983 - val_loss: 0.0410 - val_accuracy: 0.9903 Epoch 11/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0035 - accuracy: 0.9988 - val_loss: 0.0558 - val_accuracy: 0.9875 Epoch 12/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0018 - accuracy: 0.9995 - val_loss: 0.0459 - val_accuracy: 0.9898
each epoch 4s each step 8m/s accuracy :0.9995 val_accuracy: 0.9898
Tested on a MacBook Pro (13-inch, M1, 2020) with 8 GB RAM
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1554 - accuracy: 0.9533
469/469 [==============================] - 14s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9534 - val_loss: 0.0524 - val_accuracy: 0.9836
Epoch 2/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9865 - val_loss: 0.0402 - val_accuracy: 0.9863
Epoch 3/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0263 - accuracy: 0.9919 - val_loss: 0.0316 - val_accuracy: 0.9901
Epoch 4/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0319 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0115 - accuracy: 0.9961 - val_loss: 0.0370 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0103 - accuracy: 0.9965 - val_loss: 0.0376 - val_accuracy: 0.9893
Epoch 7/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9973 - val_loss: 0.0345 - val_accuracy: 0.9892
Epoch 8/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9982 - val_loss: 0.0340 - val_accuracy: 0.9900
Epoch 9/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9976 - val_loss: 0.0442 - val_accuracy: 0.9888
Epoch 10/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0040 - accuracy: 0.9987 - val_loss: 0.0374 - val_accuracy: 0.9895
Epoch 11/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9984 - val_loss: 0.0370 - val_accuracy: 0.9906
Epoch 12/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0034 -
accuracy: 0.9988 - val_loss: 0.0478 - val_accuracy: 0.9883
CPU times: user 2min 6s, sys: 30.9 s, total: 2min 37s
Wall time: 3min 2s
The following code implements the original @ylecun LeCun's CNN architecture., with
Dropout
comment out due to an issue.packages required to run: