apple / tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Other
3.67k stars 308 forks source link

Benchmark: CNN proposal #25

Open Willian-Zhang opened 3 years ago

Willian-Zhang commented 3 years ago

The following code implements the original @ylecun LeCun's CNN architecture., with Dropout comment out due to an issue.

import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds

tf.enable_v2_behavior()

from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

batch_size = 128

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#   tf.keras.layers.Dropout(0.25),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
#   tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy'],
)

model.fit(
    ds_train,
    epochs=12,
    validation_data=ds_test,
)

packages required to run:

pip install tensorflow_datasets
Willian-Zhang commented 3 years ago

This is giving me results on my MacBook Air 2020 m1 8G

2020-11-20 23:47:18.141957: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 23:47:18.145970: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-20 23:47:18.479186: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/10
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1614 - accuracy: 0.9519/Users/willian/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1613 - accuracy: 0.9519 - val_loss: 0.0449 - val_accuracy: 0.9853
Epoch 2/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0427 - accuracy: 0.9867 - val_loss: 0.0336 - val_accuracy: 0.9885
Epoch 3/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0264 - accuracy: 0.9914 - val_loss: 0.0333 - val_accuracy: 0.9885
Epoch 4/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0167 - accuracy: 0.9946 - val_loss: 0.0393 - val_accuracy: 0.9879
Epoch 5/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0128 - accuracy: 0.9956 - val_loss: 0.0333 - val_accuracy: 0.9890
Epoch 6/10
469/469 [==============================] - 24s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9973 - val_loss: 0.0341 - val_accuracy: 0.9900
Epoch 7/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9975 - val_loss: 0.0379 - val_accuracy: 0.9887
Epoch 8/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9979 - val_loss: 0.0366 - val_accuracy: 0.9906
Epoch 9/10
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9982 - val_loss: 0.0512 - val_accuracy: 0.9859
Epoch 10/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9982 - val_loss: 0.0462 - val_accuracy: 0.9884

Keys are:

on my Mac mini 2020 m1 16G

ephes commented 3 years ago

Run this on my MacBook Pro (16 Zoll, 2019) 2,3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 8GB

2020-11-20 17:42:23.136427: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-20 17:42:23.318515: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 17:42:24.014368: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1588 - accuracy: 0.9514/Users/jochen/projects/ds_tutorial/mac_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 57s 114ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1588 - accuracy: 0.9514 - val_loss: 0.0479 - val_accuracy: 0.9841
Epoch 2/12
469/469 [==============================] - 56s 116ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0442 - accuracy: 0.9863 - val_loss: 0.0348 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 56s 115ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0277 - accuracy: 0.9913 - val_loss: 0.0393 - val_accuracy: 0.9863
Epoch 4/12
469/469 [==============================] - 56s 115ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0189 - accuracy: 0.9940 - val_loss: 0.0387 - val_accuracy: 0.9876
Epoch 5/12
469/469 [==============================] - 56s 114ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0142 - accuracy: 0.9953 - val_loss: 0.0354 - val_accuracy: 0.9895
Epoch 6/12
469/469 [==============================] - 57s 117ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0092 - accuracy: 0.9970 - val_loss: 0.0407 - val_accuracy: 0.9881
...
real    11m31.063s
user    16m18.586s
sys 4m3.070s
tranchis commented 3 years ago

My results with a Macbook Pro M1, 16Gb of RAM:

2020-11-20 21:18:55.599180: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 21:18:55.599898: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-20 21:18:55.889178: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1508 - accuracy: 0.9560/Users/sergio/repos/tf-test/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9561 - val_loss: 0.0479 - val_accuracy: 0.9851
Epoch 2/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0421 - accuracy: 0.9868 - val_loss: 0.0383 - val_accuracy: 0.9870
Epoch 3/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0262 - accuracy: 0.9916 - val_loss: 0.0407 - val_accuracy: 0.9874
Epoch 4/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944 - val_loss: 0.0353 - val_accuracy: 0.9868
Epoch 5/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0125 - accuracy: 0.9960 - val_loss: 0.0395 - val_accuracy: 0.9885
Epoch 6/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0094 - accuracy: 0.9971 - val_loss: 0.0393 - val_accuracy: 0.9898
Epoch 7/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0095 - accuracy: 0.9968 - val_loss: 0.0421 - val_accuracy: 0.9887
Epoch 8/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0066 - accuracy: 0.9978 - val_loss: 0.0437 - val_accuracy: 0.9892
Epoch 9/12
469/469 [==============================] - 25s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9982 - val_loss: 0.0437 - val_accuracy: 0.9897
Epoch 10/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9984 - val_loss: 0.0510 - val_accuracy: 0.9879
Epoch 11/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9986 - val_loss: 0.0401 - val_accuracy: 0.9912
Epoch 12/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9983 - val_loss: 0.0472 - val_accuracy: 0.9901

One thing to note is that there must be a bottleneck somewhere. I was monitoring the GPU usage in Activity Monitor and it never went above 60%.

anna-tikhonova commented 3 years ago

@Willian-Zhang Thank you for providing a reproducible test case. We will take a look.

rnogy commented 3 years ago

MacBook Pro ,13-inch, 2017, i5, 8GB, intel iris 640

apple compiled tensorflow

Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1539 - accuracy: 0.9537/Users/corgi/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 108s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1539 - accuracy: 0.9537 - val_loss: 0.0472 - val_accuracy: 0.9849
Epoch 2/12
469/469 [==============================] - 101s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0406 - accuracy: 0.9875 - val_loss: 0.0408 - val_accuracy: 0.9863
Epoch 3/12
469/469 [==============================] - 98s 201ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0261 - accuracy: 0.9922 - val_loss: 0.0427 - val_accuracy: 0.9873
Epoch 4/12
469/469 [==============================] - 100s 204ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0169 - accuracy: 0.9945 - val_loss: 0.0293 - val_accuracy: 0.9905
Epoch 5/12
469/469 [==============================] - 98s 202ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0120 - accuracy: 0.9963 - val_loss: 0.0332 - val_accuracy: 0.9902
Epoch 6/12
469/469 [==============================] - 98s 201ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0097 - accuracy: 0.9970 - val_loss: 0.0361 - val_accuracy: 0.9898
Epoch 7/12
469/469 [==============================] - 99s 203ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0088 - accuracy: 0.9971 - val_loss: 0.0409 - val_accuracy: 0.9880
Epoch 8/12
469/469 [==============================] - 99s 202ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9983 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 9/12
469/469 [==============================] - 97s 200ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0411 - val_accuracy: 0.9888
Epoch 10/12
469/469 [==============================] - 99s 203ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0493 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 101s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9983 - val_loss: 0.0446 - val_accuracy: 0.9892
Epoch 12/12
469/469 [==============================] - 100s 205ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9985 - val_loss: 0.0440 - val_accuracy: 0.9891

pip version (tf 2.3.1)

Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1640 - accuracy: 0.9506WARNING:tensorflow:From /Users/corgi/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /Users/corgi/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
469/469 [==============================] - 67s 143ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1640 - accuracy: 0.9506 - val_loss: 0.0571 - val_accuracy: 0.9810
Epoch 2/12
469/469 [==============================] - 63s 134ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0431 - accuracy: 0.9868 - val_loss: 0.0397 - val_accuracy: 0.9864
Epoch 3/12
469/469 [==============================] - 57s 122ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0266 - accuracy: 0.9916 - val_loss: 0.0361 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 57s 122ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0186 - accuracy: 0.9940 - val_loss: 0.0351 - val_accuracy: 0.9895
Epoch 5/12
469/469 [==============================] - 56s 120ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0130 - accuracy: 0.9959 - val_loss: 0.0396 - val_accuracy: 0.9886
Epoch 6/12
469/469 [==============================] - 57s 121ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0097 - accuracy: 0.9967 - val_loss: 0.0392 - val_accuracy: 0.9880
Epoch 7/12
469/469 [==============================] - 59s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0083 - accuracy: 0.9970 - val_loss: 0.0376 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 59s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9978 - val_loss: 0.0423 - val_accuracy: 0.9880
Epoch 9/12
469/469 [==============================] - 56s 119ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9982 - val_loss: 0.0357 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 57s 121ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9981 - val_loss: 0.0378 - val_accuracy: 0.9902
Epoch 11/12
469/469 [==============================] - 56s 119ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0029 - accuracy: 0.9990 - val_loss: 0.0383 - val_accuracy: 0.9910
Epoch 12/12
469/469 [==============================] - 58s 124ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9985 - val_loss: 0.0435 - val_accuracy: 0.9903

tf compiled with FMA, AVX, AVX2, SSE4.1, SSE4.2 flag Wheel from (https://github.com/lakshayg/tensorflow-build)

Epoch 1/12
469/469 [==============================] - ETA: 0s - loss: 0.1570 - accuracy: 0.95272020-11-21 01:23:29.984485: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 64s 135ms/step - loss: 0.1570 - accuracy: 0.9527 - val_loss: 0.0511 - val_accuracy: 0.9836
Epoch 2/12
469/469 [==============================] - ETA: 0s - loss: 0.0425 - accuracy: 0.98662020-11-21 01:24:41.347821: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 67s 142ms/step - loss: 0.0425 - accuracy: 0.9866 - val_loss: 0.0405 - val_accuracy: 0.9867
Epoch 3/12
469/469 [==============================] - ETA: 0s - loss: 0.0274 - accuracy: 0.99152020-11-21 01:25:55.016136: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 68s 145ms/step - loss: 0.0274 - accuracy: 0.9915 - val_loss: 0.0339 - val_accuracy: 0.9886
...
Epoch 11/12
469/469 [==============================] - ETA: 0s - loss: 0.0034 - accuracy: 0.99892020-11-21 01:34:52.652276: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 64s 137ms/step - loss: 0.0034 - accuracy: 0.9989 - val_loss: 0.0429 - val_accuracy: 0.9910
Epoch 12/12
469/469 [==============================] - 61s 129ms/step - loss: 0.0034 - accuracy: 0.9988 - val_loss: 0.0515 - val_accuracy: 0.9893

It's interesting to see apple's optimized version of tensorflow is slower than the pip version. Looking at the warming

I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA

, I think it either has to do with intel's oneapi or the support for instruction sets on x86 that leads to its performance loss. I tried using the compiled binaries that support FMA, AVX, AVX2, SSE4.1, SSE4.2 to see if it is the instruction support that leads to the performance loss, but it throws a warning (due to exhausted data; why only this run?; Batch size -> 118?). Anyhow, it'd be nice if apple provides more documentation about their own version of tf and please let me know if I am the only one who encountered that tensorflow-macos is slower than pip tensorflow (-> request for documentation / request for feature (instruction set) support?).

dkgaraujo commented 3 years ago

Running Apple's Mac-optimized on a 2019 16' MacBook Pro with AMD Radeon Pro 5500M:

Screenshot 2020-11-21 at 12 33 40

And here is the GPU performance after the first epochs have started.

Screenshot 2020-11-21 at 12 17 38

I suspect the slack in the GPU is due to the comparatively low batch size compared to the GPU memory capacity. When I change batch_size = 500, the results are as follows:

Screenshot 2020-11-21 at 12 52 04

With the following GPU usage:

Screenshot 2020-11-21 at 12 51 50

Note that each epoch now takes 27s, less than half of the speed with batch_size=128. I think this illustrates that each combination of backend + GPU + specific data at hand has a value of batch size that will optimize speed; it's up for the analyst to find it (maybe running one-epoch only iterations to check speed at different settings).

anhornsby commented 3 years ago

To echo @dkgaraujo, I can run this at around 24s per epoch on a Macbook Pro 16" 2019 with Radeon Pro 5300M if I increase the batch size (e.g., batch_size = 1250). This is about 10s quicker per epoch compared to CPU and comparable to the M1 benchmarks posted above.

With low batch sizes (e.g. 128), GPU performance is comparable or slower vs CPU.

Willian-Zhang commented 3 years ago

@anhornsby with batch_size = 1250 (Train on 48 steps, validate on 8 steps)

on MacBook Air 2020 m1 8G, I get:

on Mac mini 2020 m1 16G:

robin7g commented 3 years ago

Results on my Mac Mini 2020 m1 16G.

GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)

Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.

python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881

Highlights are:

15s/epoch 33ms/step (original batch size) 98.8% final accuracy

dkgaraujo commented 3 years ago

To echo @dkgaraujo, I can run this at around 24s per epoch on a Macbook Pro 16" 2019 with Radeon Pro 5300M if I increase the batch size (e.g., batch_size = 1250). This is about 10s quicker per epoch compared to CPU and comparable to the M1 benchmarks posted above.

With low batch sizes (e.g. 128), GPU performance is comparable or slower vs CPU.

Some more results:

I ran the same code as before, but with batch_size = 2000. With the GPU I had 20s/epoch, compared to the CPU with 85s/epoch.

anhornsby commented 3 years ago

Results on my Mac Mini 2020 m1 16G.

GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)

Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.

python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881

Highlights are:

15s/epoch 33ms/step (original batch size) 98.8% final accuracy

Commenting out the line that disables eager execution seems helpful. 20s per epoch with batch_size = 1500.

danielmbradley commented 3 years ago

Results on my Mac Mini 2020 m1 16G. GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!) Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.

python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881

Highlights are: 15s/epoch 33ms/step (original batch size) 98.8% final accuracy

Commenting out the line that disables eager execution seems helpful. 20s per epoch with batch_size = 1500.

Interestingly, when I removed the line that disables eager execution my system just ended up hanging? Did you change anything else other than just commenting that out @anhornsby?

anhornsby commented 3 years ago

@danielmbradley nope, same code as above, using the recommended virtualenv

DVS70 commented 3 years ago

Macbook Pro M1, 16Gb of RAM standard tf installation with venv, execution from terminal, no other significant processes running

batch size 128: 23s/epoch, 45ms/step, 98.98% final accuracy, GPU% ~ 55% batch size 256: 15s/epoch, 59ms/steep, 99.11% final accuracy, GPU% ~65% batch size 512: 13s/epoch, 98ms/steep, 99.01% final accuracy, GPU% ~75% batch size 1024: 12s/epoch, 180ms/steep, 98.99% final accuracy, GPU% ~80% batch size 1280: 12s/epoch, 227ms/steep, 98.86% final accuracy, GPU% ~83% batch size 2048: 13s/epoch, 375ms/step, 98.76% final accuracy, GPU% ~88% batch size 4096: 15s/epoch, 890ms/step, 98.57% final accuracy, GPU% up to 90%

danielmbradley commented 3 years ago

@anhornsby Interesting, there must be some difference in the way they implemented eager execution between intel Macs and M1 Macs, mine just completely falls over when that line is missing. Did find increasing the size of the batches significantly increased processing speed though (oddly though the time printed in the terminal was wrong once it hit 22 seconds)

VictorownzuA11 commented 3 years ago

Just for fun I wanted to try running this on a Windows 10 Laptop with a mobile 1060 (6G) and i7-7700HQ, 16GB RAM:

batch_size = 128

469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1642 - accuracy: 0.9517 - val_loss: 0.0566 - val_accuracy: 0.9817 Epoch 2/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9865 - val_loss: 0.0368 - val_accuracy: 0.9879 Epoch 3/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0281 - accuracy: 0.9908 - val_loss: 0.0357 - val_accuracy: 0.9880 Epoch 4/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0179 - accuracy: 0.9941 - val_loss: 0.0335 - val_accuracy: 0.9893 Epoch 5/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0133 - accuracy: 0.9956 - val_loss: 0.0405 - val_accuracy: 0.9878 Epoch 6/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0095 - accuracy: 0.9968 - val_loss: 0.0305 - val_accuracy: 0.9912 Epoch 7/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9973 - val_loss: 0.0373 - val_accuracy: 0.9896 Epoch 8/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9972 - val_loss: 0.0443 - val_accuracy: 0.9877 Epoch 9/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9982 - val_loss: 0.0397 - val_accuracy: 0.9894 Epoch 10/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0487 - val_accuracy: 0.9885 Epoch 11/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9985 - val_loss: 0.0502 - val_accuracy: 0.9866 Epoch 12/12 469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9984 - val_loss: 0.0426 - val_accuracy: 0.9896

Highlights are: 5s/epoch 11ms/step (original batch size) 98.96% final accuracy

batch_size = 1250

48/48 [==============================] - 4s 78ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.5046 - accuracy: 0.8650 - val_loss: 0.1678 - val_accuracy: 0.9517 Epoch 2/12 48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.1180 - accuracy: 0.9659 - val_loss: 0.0735 - val_accuracy: 0.9778 Epoch 3/12 48/48 [==============================] - 4s 76ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0654 - accuracy: 0.9811 - val_loss: 0.0520 - val_accuracy: 0.9828 Epoch 4/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0463 - accuracy: 0.9866 - val_loss: 0.0465 - val_accuracy: 0.9847 Epoch 5/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0388 - accuracy: 0.9882 - val_loss: 0.0448 - val_accuracy: 0.9852 Epoch 6/12 48/48 [==============================] - 4s 76ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0324 - accuracy: 0.9905 - val_loss: 0.0399 - val_accuracy: 0.9868 Epoch 7/12 48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0257 - accuracy: 0.9924 - val_loss: 0.0373 - val_accuracy: 0.9885 Epoch 8/12 48/48 [==============================] - 4s 78ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0209 - accuracy: 0.9942 - val_loss: 0.0387 - val_accuracy: 0.9882 Epoch 9/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0174 - accuracy: 0.9950 - val_loss: 0.0368 - val_accuracy: 0.9883 Epoch 10/12 48/48 [==============================] - 4s 77ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0152 - accuracy: 0.9955 - val_loss: 0.0379 - val_accuracy: 0.9887 Epoch 11/12 48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0124 - accuracy: 0.9964 - val_loss: 0.0397 - val_accuracy: 0.9880 Epoch 12/12 48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0096 - accuracy: 0.9974 - val_loss: 0.0394 - val_accuracy: 0.9885

Highlights are: 3-4s/epoch 71-78ms/step 98.85% final accuracy

batch_size = 4096

Highlights are: 3s/epoch 200ms/step 98.58% final accuracy

SpaceMonkeyForever commented 3 years ago

MacBook Air 2020 M1 with 16 GB - Same as others with an M1 MacBook

2020-11-24 21:24:52.855304: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-24 21:24:52.856412: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-24 21:24:53.156975: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1565 - accuracy: 0.9534/Users/spacemonkey/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 26s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1563 - accuracy: 0.9535 - val_loss: 0.0468 - val_accuracy: 0.9847
Epoch 2/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0437 - accuracy: 0.9865 - val_loss: 0.0381 - val_accuracy: 0.9871
Epoch 3/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0277 - accuracy: 0.9912 - val_loss: 0.0390 - val_accuracy: 0.9879
Epoch 4/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0174 - accuracy: 0.9947 - val_loss: 0.0370 - val_accuracy: 0.9865
Epoch 5/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0123 - accuracy: 0.9961 - val_loss: 0.0399 - val_accuracy: 0.9873
Epoch 6/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0099 - accuracy: 0.9966 - val_loss: 0.0379 - val_accuracy: 0.9889
Epoch 7/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0086 - accuracy: 0.9971 - val_loss: 0.0417 - val_accuracy: 0.9878
Epoch 8/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9980 - val_loss: 0.0412 - val_accuracy: 0.9892
Epoch 9/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9984 - val_loss: 0.0411 - val_accuracy: 0.9904
Epoch 10/12
469/469 [==============================] - 25s 50ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9983 - val_loss: 0.0559 - val_accuracy: 0.9868
Epoch 11/12
469/469 [==============================] - 24s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9988 - val_loss: 0.0417 - val_accuracy: 0.9897
Epoch 12/12
469/469 [==============================] - 25s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0448 - val_accuracy: 0.9893
SpaceMonkeyForever commented 3 years ago

Windows GeForce GTX 1080TI, intel 5820k using tensorflow-gpu version 2.3.1

I had to comment out these lines:

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')

Results: Batch size = 128 2s/Batch 5ms/step val_accuracy: 0.9870

Log:

2020-11-25 00:22:53.068167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:53.076410: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:53.093385: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:53.110747: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:53.119427: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:53.139974: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:53.149363: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:53.185160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:53.188810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:53.192451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:53.202681: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:53.208791: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:53.212913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:53.218933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:53.223650: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:53.229800: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:53.233966: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:53.239978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:53.907505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 00:22:53.911316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-11-25 00:22:53.914206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-11-25 00:22:53.919438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-11-25 00:22:53.930196: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14efa539c30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 00:22:53.935222: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-25 00:22:54.154381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:54.162717: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:54.169072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:54.173108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:54.179071: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:54.183115: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:54.189077: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:54.193209: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:54.199286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:54.202657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 00:22:54.208740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-11-25 00:22:54.211523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-11-25 00:22:54.214418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
Train on 469 steps, validate on 79 steps
Epoch 1/12
2020-11-25 00:22:55.528873: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:57.078128: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:57.840244: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1612 - accuracy: 0.9516WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1612 - accuracy: 0.9516 - val_loss: 0.0503 - val_accuracy: 0.9850
Epoch 2/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0382 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0295 - accuracy: 0.9905 - val_loss: 0.0416 - val_accuracy: 0.9851
Epoch 4/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0205 - accuracy: 0.9932 - val_loss: 0.0342 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0138 - accuracy: 0.9955 - val_loss: 0.0373 - val_accuracy: 0.9885
Epoch 6/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0103 - accuracy: 0.9967 - val_loss: 0.0395 - val_accuracy: 0.9881
Epoch 7/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9970 - val_loss: 0.0372 - val_accuracy: 0.9887
Epoch 8/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0072 - accuracy: 0.9977 - val_loss: 0.0389 - val_accuracy: 0.9897
Epoch 9/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0066 - accuracy: 0.9980 - val_loss: 0.0419 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9984 - val_loss: 0.0439 - val_accuracy: 0.9891
Epoch 11/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9989 - val_loss: 0.0503 - val_accuracy: 0.9889
Epoch 12/12
469/469 [==============================] - 2s 4ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9984 - val_loss: 0.0605 - val_accuracy: 0.9870

I ran again with batch size = 512 since I have a lot of memory on this GPU.

Results: 1s/Batch 12ms/step val_accuracy: 0.9905

Log:

2020-11-25 10:32:51.405457: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.188682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-11-25 10:32:54.219663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:54.219952: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.224344: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:54.228633: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:54.230251: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:54.235019: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:54.237543: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:54.247529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:54.247745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:54.248102: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-25 10:32:54.257534: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x17097fc7d50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:32:54.257730: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-25 10:32:54.258020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:54.258307: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.258451: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:54.258593: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:54.258735: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:54.258879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:54.259021: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:54.259161: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:54.259371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:54.873392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:32:54.873552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-25 10:32:54.873646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-25 10:32:54.873964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-11-25 10:32:54.876885: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x170bb22d130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:32:54.877077: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-25 10:32:55.083716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:55.084009: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:55.084150: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:55.084287: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:55.084425: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:55.084567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:55.084708: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:55.084846: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:55.085016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:55.085171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:32:55.085319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-25 10:32:55.085408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-25 10:32:55.085599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
Train on 118 steps, validate on 20 steps
Epoch 1/12
2020-11-25 10:32:56.377688: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:57.797291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:58.516367: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
118/118 [==============================] - ETA: 0s - batch: 58.5000 - size: 1.0000 - loss: 0.3163 - accuracy: 0.9059WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
118/118 [==============================] - 2s 14ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.3163 - accuracy: 0.9059 - val_loss: 0.0891 - val_accuracy: 0.9738
Epoch 2/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0695 - accuracy: 0.9798 - val_loss: 0.0621 - val_accuracy: 0.9800
Epoch 3/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0455 - accuracy: 0.9863 - val_loss: 0.0431 - val_accuracy: 0.9861
Epoch 4/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0331 - accuracy: 0.9897 - val_loss: 0.0386 - val_accuracy: 0.9876
Epoch 5/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0259 - accuracy: 0.9922 - val_loss: 0.0322 - val_accuracy: 0.9890
Epoch 6/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0203 - accuracy: 0.9937 - val_loss: 0.0329 - val_accuracy: 0.9895
Epoch 7/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0165 - accuracy: 0.9952 - val_loss: 0.0364 - val_accuracy: 0.9880
Epoch 8/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0126 - accuracy: 0.9960 - val_loss: 0.0303 - val_accuracy: 0.9909
Epoch 9/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0089 - accuracy: 0.9976 - val_loss: 0.0364 - val_accuracy: 0.9893
Epoch 10/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0082 - accuracy: 0.9976 - val_loss: 0.0357 - val_accuracy: 0.9900
Epoch 11/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9987 - val_loss: 0.0395 - val_accuracy: 0.9892
Epoch 12/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0377 - val_accuracy: 0.9905
SpaceMonkeyForever commented 3 years ago

Question please:

Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

lightb0x commented 3 years ago

tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3

batch s / epoch ms / step acc. gpu-util (%)
128 2 3-4 0.9884 73-75
256 1 5-6 0.9881 82
512 1 10-11 0.9881 87
1024 1 19-20 0.9889 92
1280 1 24-30 0.9880 94
2048 1 37-40 0.9883 95
4096 9->1 620->65 0.9872 97

batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)

sidagrawal commented 3 years ago

Tested on a MacBook Pro (13-inch, M1, 2020) with 8 GB RAM

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9523/Users/sidagrawal/MachineLearning/env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 106s 220ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9523 - val_loss: 0.0538 - val_accuracy: 0.9827
Epoch 2/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9863 - val_loss: 0.0388 - val_accuracy: 0.9874
Epoch 3/12
469/469 [==============================] - 103s 217ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0271 - accuracy: 0.9917 - val_loss: 0.0362 - val_accuracy: 0.9879
Epoch 4/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0170 - accuracy: 0.9950 - val_loss: 0.0300 - val_accuracy: 0.9897
Epoch 5/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0133 - accuracy: 0.9959 - val_loss: 0.0369 - val_accuracy: 0.9892
Epoch 6/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0089 - accuracy: 0.9971 - val_loss: 0.0393 - val_accuracy: 0.9890
Epoch 7/12
469/469 [==============================] - 105s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0068 - accuracy: 0.9977 - val_loss: 0.0474 - val_accuracy: 0.9867
Epoch 8/12
469/469 [==============================] - 105s 221ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9977 - val_loss: 0.0374 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9983 - val_loss: 0.0376 - val_accuracy: 0.9898
Epoch 10/12
469/469 [==============================] - 103s 216ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0493 - val_accuracy: 0.9888
Epoch 11/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0389 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 105s 220ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9990 - val_loss: 0.0424 - val_accuracy: 0.9904

No changes to the script: 105 s/epoch 220ms/step 99.04% final acc

Not sure why my numbers aren't comparable to the other M1 numbers.

astrowonk commented 3 years ago

tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3 batch s / epoch ms / step acc. gpu-util (%) 128 2 3-4 0.9884 73-75 256 1 5-6 0.9881 82 512 1 10-11 0.9881 87 1024 1 19-20 0.9889 92 1280 1 24-30 0.9880 94 2048 1 37-40 0.9883 95 4096 9->1 620->65 0.9872 97

batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)

3070 running Tensorflow, how did you do it? I thought you needed CUDA 11 on a 3070 and that there were problems with CUDA 11 and the nightly. I guess the difference is Windows vs Ubuntu.

One thing I hope is that with support for Apple ComputeML, as the M line of apple chips evolves, this fork "just works" with faster/better Apple Silicon going forward, rather than needing an endless series of patches/etc. The CUDA/CUDNN install dance on Windows never fails to thwart me.

astrowonk commented 3 years ago

Question please:

Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

SpaceMonkeyForever commented 3 years ago

Question please: Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

Oh, I didn't think of that. Do you have any source on this?

danielmbradley commented 3 years ago

Question please: Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

Oh, I didn't think of that. Do you have any source on this?

I'm not sure how true that is? I've never had any issue with speed when making predictions on non-ML specific hardware, it's always been the training that's been slow

astrowonk commented 3 years ago

Question please: Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

Oh, I didn't think of that. Do you have any source on this?

I'm not sure how true that is? I've never had any issue with speed when making predictions on non-ML specific hardware, it's always been the training that's been slow

Information isn't great on the Neural Engine. CoreML is definitely a way to run trained models on device. This repo talks about what we know about the neural engine.

The impressive speedup of the super resolution scaling in Pixelmator 2 cites the Neural Engine as helping on M1 Macs

It's notable that the writeup on this branch of Tensorflow talks about using ML Compute to enhance the training speed by using the CPU and GPU, but doesn't mention the Neural Engine itself. It would be great if we could use it to train! Perhaps that's coming some day?

BrentOeyen-CA commented 3 years ago

Macbook pro RAM 16 GB, HD 500 GB, same script but no disable eager execution

Epoch 1/12 2020-11-27 00:02:50.544598: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2020-11-27 00:02:50.545510: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz 469/469 [==============================] - 18s 35ms/step - loss: 0.3663 - accuracy: 0.8887 - val_loss: 0.0470 - val_accuracy: 0.9846 Epoch 2/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0449 - accuracy: 0.9865 - val_loss: 0.0438 - val_accuracy: 0.9844 Epoch 3/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss: 0.0314 - val_accuracy: 0.9885 Epoch 4/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0177 - accuracy: 0.9949 - val_loss: 0.0361 - val_accuracy: 0.9884 Epoch 5/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0108 - accuracy: 0.9965 - val_loss: 0.0310 - val_accuracy: 0.9903 Epoch 6/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0081 - accuracy: 0.9976 - val_loss: 0.0311 - val_accuracy: 0.9905 Epoch 7/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0069 - accuracy: 0.9977 - val_loss: 0.0441 - val_accuracy: 0.9880 Epoch 8/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0051 - accuracy: 0.9982 - val_loss: 0.0352 - val_accuracy: 0.9902 Epoch 9/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0371 - val_accuracy: 0.9901 Epoch 10/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0035 - accuracy: 0.9987 - val_loss: 0.0349 - val_accuracy: 0.9905 Epoch 11/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0035 - accuracy: 0.9990 - val_loss: 0.0381 - val_accuracy: 0.9895 Epoch 12/12 469/469 [==============================] - 16s 34ms/step - loss: 0.0044 - accuracy: 0.9987 - val_loss: 0.0401 - val_accuracy: 0.9901

lightb0x commented 3 years ago

tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3

batch s / epoch ms / step acc. gpu-util (%)

128 2 3-4 0.9884 73-75

256 1 5-6 0.9881 82

512 1 10-11 0.9881 87

1024 1 19-20 0.9889 92

1280 1 24-30 0.9880 94

2048 1 37-40 0.9883 95

4096 9->1 620->65 0.9872 97

batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)

3070 running Tensorflow, how did you do it? I thought you needed CUDA 11 on a 3070 and that there were problems with CUDA 11 and the nightly. I guess the difference is Windows vs Ubuntu.

Just install CUDA 11.1 compatible driver(455 for now) and use aforementioned container. Container takes care of troublesome dependency problems. Check this for detail.

Shakshi3104 commented 3 years ago

Tested on MacBook Air (13-inch, Early 2015, 1.6GHz Intel Core i5, Intel HD Graphics 6000) with 8GB RAM

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9540/Users/user/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 222s 461ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9540 - val_loss: 0.0448 - val_accuracy: 0.9861
Epoch 2/12
469/469 [==============================] - 231s 482ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0439 - accuracy: 0.9866 - val_loss: 0.0357 - val_accuracy: 0.9876
Epoch 3/12
469/469 [==============================] - 241s 503ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0265 - accuracy: 0.9915 - val_loss: 0.0342 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 277s 576ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0191 - accuracy: 0.9942 - val_loss: 0.0307 - val_accuracy: 0.9893
Epoch 5/12
469/469 [==============================] - 248s 512ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0117 - accuracy: 0.9964 - val_loss: 0.0329 - val_accuracy: 0.9897
Epoch 6/12
469/469 [==============================] - 230s 478ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0107 - accuracy: 0.9966 - val_loss: 0.0353 - val_accuracy: 0.9888
Epoch 7/12
469/469 [==============================] - 232s 482ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9973 - val_loss: 0.0533 - val_accuracy: 0.9864
Epoch 8/12
469/469 [==============================] - 268s 561ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9979 - val_loss: 0.0429 - val_accuracy: 0.9885
Epoch 9/12
469/469 [==============================] - 235s 485ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9982 - val_loss: 0.0363 - val_accuracy: 0.9899
Epoch 10/12
469/469 [==============================] - 253s 528ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9982 - val_loss: 0.0348 - val_accuracy: 0.9909
Epoch 11/12
469/469 [==============================] - 248s 507ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9984 - val_loss: 0.0405 - val_accuracy: 0.9905
Epoch 12/12
469/469 [==============================] - 248s 515ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0123 - accuracy: 0.9960 - val_loss: 0.0381 - val_accuracy: 0.9886

No changes to the script: 248 s/epoch 515ms/step 98.86% final acc

ismaproco commented 3 years ago

MacBook Air 2020 M1 with 8 GB- Connected to Power - No difference really to the others with M1

2020-11-27 15:16:01.395210: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-27 15:16:01.398078: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-27 15:16:01.702008: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9520/Users/savathos/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 25s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1598 - accuracy: 0.9520 - val_loss: 0.0498 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0424 - accuracy: 0.9868 - val_loss: 0.0392 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0270 - accuracy: 0.9918 - val_loss: 0.0382 - val_accuracy: 0.9872
Epoch 4/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944 - val_loss: 0.0397 - val_accuracy: 0.9879
Epoch 5/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0116 - accuracy: 0.9962 - val_loss: 0.0449 - val_accuracy: 0.9870
Epoch 6/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9968 - val_loss: 0.0383 - val_accuracy: 0.9885
Epoch 7/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0068 - accuracy: 0.9979 - val_loss: 0.0441 - val_accuracy: 0.9865
Epoch 8/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0073 - accuracy: 0.9976 - val_loss: 0.0529 - val_accuracy: 0.9869
Epoch 9/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980 - val_loss: 0.0451 - val_accuracy: 0.9884
Epoch 10/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0042 - accuracy: 0.9987 - val_loss: 0.0542 - val_accuracy: 0.9874
Epoch 11/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9984 - val_loss: 0.0505 - val_accuracy: 0.9877
Epoch 12/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0492 - val_accuracy: 0.9871

No changes to the script: 24 s/epoch 47ms/step 99.89% accuracy

kennyfrc commented 3 years ago

Device: MacBook Pro (13-inch, 2019), 2.4 GHz Quad-Core Intel Core i5, 8GB RAM, Radeon RX 5700 XT 8 GB

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - 63s 128ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1607 - accuracy: 0.9530 - val_loss: 0.0528 - val_accuracy: 0.9827
Epoch 2/12
469/469 [==============================] - 62s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0439 - accuracy: 0.9863 - val_loss: 0.0375 - val_accuracy: 0.9874
Epoch 3/12
469/469 [==============================] - 62s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0257 - accuracy: 0.9917 - val_loss: 0.0369 - val_accuracy: 0.9881
Epoch 4/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0188 - accuracy: 0.9937 - val_loss: 0.0327 - val_accuracy: 0.9899
Epoch 5/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0116 - accuracy: 0.9964 - val_loss: 0.0441 - val_accuracy: 0.9864
Epoch 6/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0092 - accuracy: 0.9970 - val_loss: 0.0341 - val_accuracy: 0.9903
Epoch 7/12
469/469 [==============================] - 61s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0338 - val_accuracy: 0.9897
Epoch 8/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9979 - val_loss: 0.0392 - val_accuracy: 0.9888
Epoch 9/12
469/469 [==============================] - 61s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980 - val_loss: 0.0404 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9985 - val_loss: 0.0464 - val_accuracy: 0.9887
Epoch 11/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0473 - val_accuracy: 0.9890
Epoch 12/12
469/469 [==============================] - 63s 128ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0032 - accuracy: 0.9988 - val_loss: 0.0453 - val_accuracy: 0.9897

Summary:

ismaproco commented 3 years ago

Desktop Ryzen 2400g, 16GB, Windows (Conda) ~Worth the try~

2020-11-27 16:43:00.273670: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Epoch 1/12
    469/Unknown - 62s 132ms/step - loss: 0.1622 - accuracy: 0.95162020-11-27 16:44:06.022659: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
2020-11-27 16:44:09.471968: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 66s 140ms/step - loss: 0.1622 - accuracy: 0.9516 - val_loss: 0.0600 - val_accuracy: 0.9799
Epoch 2/12
468/469 [============================>.] - ETA: 0s - loss: 0.0428 - accuracy: 0.98692020-11-27 16:45:17.461835: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 68s 145ms/step - loss: 0.0429 - accuracy: 0.9869 - val_loss: 0.0379 - val_accuracy: 0.9882
Epoch 3/12
468/469 [============================>.] - ETA: 0s - loss: 0.0276 - accuracy: 0.99152020-11-27 16:46:21.553304: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0277 - accuracy: 0.9915 - val_loss: 0.0349 - val_accuracy: 0.9882
Epoch 4/12
468/469 [============================>.] - ETA: 0s - loss: 0.0183 - accuracy: 0.99452020-11-27 16:47:25.641510: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0183 - accuracy: 0.9945 - val_loss: 0.0359 - val_accuracy: 0.9894
Epoch 5/12
468/469 [============================>.] - ETA: 0s - loss: 0.0146 - accuracy: 0.99512020-11-27 16:48:29.695354: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0146 - accuracy: 0.9951 - val_loss: 0.0367 - val_accuracy: 0.9890
Epoch 6/12
468/469 [============================>.] - ETA: 0s - loss: 0.0089 - accuracy: 0.99702020-11-27 16:49:33.919164: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0088 - accuracy: 0.9970 - val_loss: 0.0360 - val_accuracy: 0.9895
Epoch 7/12
468/469 [============================>.] - ETA: 0s - loss: 0.0084 - accuracy: 0.99752020-11-27 16:50:38.218212: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0084 - accuracy: 0.9975 - val_loss: 0.0499 - val_accuracy: 0.9873
Epoch 8/12
468/469 [============================>.] - ETA: 0s - loss: 0.0066 - accuracy: 0.99792020-11-27 16:51:42.458833: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0066 - accuracy: 0.9979 - val_loss: 0.0402 - val_accuracy: 0.9896
Epoch 9/12
468/469 [============================>.] - ETA: 0s - loss: 0.0067 - accuracy: 0.99762020-11-27 16:52:46.661109: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0067 - accuracy: 0.9976 - val_loss: 0.0412 - val_accuracy: 0.9893
Epoch 10/12
468/469 [============================>.] - ETA: 0s - loss: 0.0041 - accuracy: 0.99872020-11-27 16:53:52.020888: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 65s 139ms/step - loss: 0.0041 - accuracy: 0.9987 - val_loss: 0.0374 - val_accuracy: 0.9901
Epoch 11/12
468/469 [============================>.] - ETA: 0s - loss: 0.0034 - accuracy: 0.99892020-11-27 16:54:55.984763: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 136ms/step - loss: 0.0034 - accuracy: 0.9989 - val_loss: 0.0458 - val_accuracy: 0.9904
Epoch 12/12
468/469 [============================>.] - ETA: 0s - loss: 0.0035 - accuracy: 0.99892020-11-27 16:55:59.786269: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 136ms/step - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0515 - val_accuracy: 0.9876
dmmajithia commented 3 years ago

Device: Mac Pro Late 2013 (3.7 GHz Quad-Core Intel Xeon E5, 2x AMD FirePro D300 2 GB, 64GB). Looks like neither of the GPUs are being used here - Max GPU utilization is ~6% and CPU Idle is ~60%.

Train on 469 steps, validate on 79 steps Epoch 1/12 469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9538 469/469 [==============================] - 170s 355ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9538 - val_loss: 0.0556 - val_accuracy: 0.9806 Epoch 2/12 469/469 [==============================] - 172s 361ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0437 - accuracy: 0.9866 - val_loss: 0.0365 - val_accuracy: 0.9881 Epoch 3/12 469/469 [==============================] - 185s 389ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0269 - accuracy: 0.9916 - val_loss: 0.0356 - val_accuracy: 0.9887 Epoch 4/12 469/469 [==============================] - 182s 383ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9946 - val_loss: 0.0375 - val_accuracy: 0.9885 Epoch 5/12 469/469 [==============================] - 171s 359ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0131 - accuracy: 0.9959 - val_loss: 0.0405 - val_accuracy: 0.9883 Epoch 6/12 469/469 [==============================] - 171s 358ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9968 - val_loss: 0.0355 - val_accuracy: 0.9899 Epoch 7/12 469/469 [==============================] - 171s 358ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9977 - val_loss: 0.0387 - val_accuracy: 0.9892 Epoch 8/12 469/469 [==============================] - 170s 355ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9981 - val_loss: 0.0394 - val_accuracy: 0.9897 Epoch 9/12 469/469 [==============================] - 172s 361ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9981 - val_loss: 0.0404 - val_accuracy: 0.9902 Epoch 10/12 469/469 [==============================] - 169s 354ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9982 - val_loss: 0.0481 - val_accuracy: 0.9882 Epoch 11/12 469/469 [==============================] - 169s 354ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9980 - val_loss: 0.0403 - val_accuracy: 0.9892 Epoch 12/12 469/469 [==============================] - 166s 348ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0036 - accuracy: 0.9990 - val_loss: 0.0532 - val_accuracy: 0.9883

In the screenshot below, GPU slot 2 is connected to the display and slot 1 is the spare.

Screen Shot 2020-11-28 at 9 25 17 PM

Surprisingly, when I ran the code from issue #39 it switched to using the idle GPU with ~80% utilization. Seems like set_mlc_device ignores my GPU recommendation when model size is small.

rizky commented 3 years ago

Mac Pro Late 2013 (3,5 GHz 6-Core Intel Xeon E5, 2x AMD FirePro D500 3 GB, 32GB).

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1605 - accuracy: 0.9520
469/469 [==============================] - 44s 78ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1605 - accuracy: 0.9520 - val_loss: 0.0501 - val_accuracy: 0.9839
Epoch 2/12
469/469 [==============================] - 39s 77ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0460 - accuracy: 0.9859 - val_loss: 0.0373 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0270 - accuracy: 0.9919 - val_loss: 0.0383 - val_accuracy: 0.9866
Epoch 4/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9937 - val_loss: 0.0334 - val_accuracy: 0.9896
Epoch 5/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0138 - accuracy: 0.9955 - val_loss: 0.0409 - val_accuracy: 0.9876
Epoch 6/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0107 - accuracy: 0.9965 - val_loss: 0.0381 - val_accuracy: 0.9886
Epoch 7/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0090 - accuracy: 0.9970 - val_loss: 0.0408 - val_accuracy: 0.9883
Epoch 8/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0363 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0385 - val_accuracy: 0.9908
Epoch 10/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9986 - val_loss: 0.0523 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9983 - val_loss: 0.0537 - val_accuracy: 0.9876
Epoch 12/12
469/469 [==============================] - 38s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9983 - val_loss: 0.0439 - val_accuracy: 0.9893
Aatiya25 commented 3 years ago

Can anyone explain how to install TensorFlow on MacBook m1 2020. I am getting error: zsh: illegal hardware instruction python under virtual environment(tensorflow_macos_venv) when I try to import TensorFlow. I am using terminal without Rosette 2.

mrdbourke commented 3 years ago

Thank you @Willian-Zhang for creating this!

I used it (code unchanged from above) to benchmark a few of my Macs + a GPU-powered Google Colab instance:

MacBook Air (M1) MacBook Pro 13-inch (M1) MacBook Pro 16-inch (Intel) Google Colab T4 GPU^
tensorflow_macos benchmark 23-24s/epoch 25-26s/epoch 20-21s/epoch 9s/epoch

Specs:

MacBook Air (M1) MacBook Pro 13-inch (M1) MacBook Pro 16-inch (Intel)
CPU 8-core M1 8-core M1 2.4GHz 8-core Intel Core i9
GPU 7-core M1 8-core M1 AMD Radeon Pro 5500M with 8GB of GDDR6 memory
Neural engine 16-core M1 16-core M1 NA
Memory (RAM) 16GB 16GB 64GB
Storage 256GB 512GB 2TB

Very interesting to see the M1 MacBook Air performing on-par/better than the M1 MacBook Pro.

The 16-inch I used is almost top-spec too (barely a year old)... incredible how performant Apple's new M1 chip is.

I also did a few more tests on each machine, namely:

  1. Final Cut Pro video export
  2. CreateML machine learning model training
  3. TensorFlow macOS code (Basic CNN, Transfer Learning, the benchmark test above)

See the results from the above on my blog. I also made a video running through each of them on YouTube.

2black0 commented 3 years ago

i5-8400T 16GB 2400Mhz

just disable 2 this line cause dont have gpu

from tensorflow.python.compiler.mlcompute import mlcompute

mlcompute.set_mlc_device(device_name='gpu')

Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1616 - accuracy: 0.9532/Users/thinkmac/opt/miniconda3/envs/tf-test/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 40s 81ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1616 - accuracy: 0.9532 - val_loss: 0.0551 - val_accuracy: 0.9816
Epoch 2/12
469/469 [==============================] - 40s 82ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0440 - accuracy: 0.9864 - val_loss: 0.0459 - val_accuracy: 0.9848
Epoch 3/12
469/469 [==============================] - 37s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0280 - accuracy: 0.9909 - val_loss: 0.0359 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 37s 75ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9937 - val_loss: 0.0332 - val_accuracy: 0.9894
Epoch 5/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0132 - accuracy: 0.9958 - val_loss: 0.0427 - val_accuracy: 0.9872
Epoch 6/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0102 - accuracy: 0.9969 - val_loss: 0.0420 - val_accuracy: 0.9877
Epoch 7/12
469/469 [==============================] - 37s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9974 - val_loss: 0.0525 - val_accuracy: 0.9843
Epoch 8/12
469/469 [==============================] - 38s 75ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9975 - val_loss: 0.0381 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0438 - val_accuracy: 0.9879
Epoch 10/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9975 - val_loss: 0.0470 - val_accuracy: 0.9880
Epoch 11/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0039 - accuracy: 0.9986 - val_loss: 0.0423 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9987 - val_loss: 0.0349 - val_accuracy: 0.9914

Result:

igaspard commented 3 years ago

MacBook Pro (16-inch, 2019) CPU: 2.3 GHz 8-Core Intel Core i9 GPU: AMD Radeon Pro 5500M 4 GB

2020-12-28 17:50:35.421277: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-28 17:50:35.544447: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-12-28 17:50:36.201512: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1651 - accuracy: 0.9515/Users/gaspardshen/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1651 - accuracy: 0.9515 - val_loss: 0.0520 - val_accuracy: 0.9835
Epoch 2/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9864 - val_loss: 0.0337 - val_accuracy: 0.9889
Epoch 3/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0275 - accuracy: 0.9918 - val_loss: 0.0360 - val_accuracy: 0.9877
Epoch 4/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0190 - accuracy: 0.9940 - val_loss: 0.0364 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0132 - accuracy: 0.9957 - val_loss: 0.0422 - val_accuracy: 0.9864
Epoch 6/12
469/469 [==============================] - 20s 38ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9965 - val_loss: 0.0375 - val_accuracy: 0.9892
Epoch 7/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0081 - accuracy: 0.9973 - val_loss: 0.0405 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9976 - val_loss: 0.0397 - val_accuracy: 0.9889
Epoch 9/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9984 - val_loss: 0.0492 - val_accuracy: 0.9872
Epoch 10/12
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0069 - accuracy: 0.9975 - val_loss: 0.0365 - val_accuracy: 0.9894
Epoch 11/12
469/469 [==============================] - 20s 38ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9985 - val_loss: 0.0374 - val_accuracy: 0.9907
Epoch 12/12
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0026 - accuracy: 0.9992 - val_loss: 0.0390 - val_accuracy: 0.9909
python cnn_benchmark.py  268.30s user 212.62s system 194% cpu 4:07.11 total

Results:

ma010 commented 3 years ago

Tested on a 2.2 GHz Quad-Core Intel Core i7, Intel Iris Pro Graphics, 2014 15-inch MacBook Pro. I also observed that the mac-optimized version seems slower than the non-optimized version. (similar to the results of @rnogy ) MacOS optimized Tensorflow, I put mlcompute.set_mlc_device(device_name='any'). I had to comment out disable_eager_execution(), otherwise I would get an error segmentation fault. Results

DevReev commented 3 years ago

2020-12-30 14:50:04.896932: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-12-30 14:50:05.037206: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes) 2020-12-30 14:50:06.878061: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) Train on 469 steps, validate on 79 steps Epoch 1/12 468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1611 - accuracy: 0.9521 469/469 [==============================] - 23s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1610 - accuracy: 0.9521 - val_loss: 0.0496 - val_accuracy: 0.9846 Epoch 2/12 469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0453 - accuracy: 0.9860 - val_loss: 0.0501 - val_accuracy: 0.9833 Epoch 3/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0284 - accuracy: 0.9910 - val_loss: 0.0380 - val_accuracy: 0.9868 Epoch 4/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9942 - val_loss: 0.0343 - val_accuracy: 0.9888 Epoch 5/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0318 - val_accuracy: 0.9904 Epoch 6/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0104 - accuracy: 0.9967 - val_loss: 0.0337 - val_accuracy: 0.9896 Epoch 7/12 469/469 [==============================] - 22s 42ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0080 - accuracy: 0.9974 - val_loss: 0.0363 - val_accuracy: 0.9895 Epoch 8/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0074 - accuracy: 0.9973 - val_loss: 0.0470 - val_accuracy: 0.9878 Epoch 9/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9976 - val_loss: 0.0436 - val_accuracy: 0.9887 Epoch 10/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9982 - val_loss: 0.0492 - val_accuracy: 0.9881 Epoch 11/12 469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9983 - val_loss: 0.0429 - val_accuracy: 0.9896 Epoch 12/12 469/469 [==============================] - 22s 43ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9989 - val_loss: 0.0454 - val_accuracy: 0.9893

Had quite some fan noise

singhsidhukuldeep commented 3 years ago

GPU name: Tesla T4 16GB vRam CPU: Intel(R) Xeon(R) CPU @ 2.20GHz RAM: 16 GB Precision: Float 32

Epoch 12/12 469/469 [==============================] - 8s 8ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0038 - accuracy: 0.9987 - val_loss: 0.0305 - val_accuracy: 0.9919 CPU times: user 2min, sys: 54.6 s, total: 2min 55s Wall time: 2min 2s


GPU name: Tesla T4 16GB vRam CPU: Intel(R) Xeon(R) CPU @ 2.20GHz RAM: 16 GB Precision: Float 16

Epoch 12/12 469/469 [==============================] - 9s 8ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9982 - val_loss: 0.0422 - val_accuracy: 0.9894 CPU times: user 2min 5s, sys: 55.8 s, total: 3min 1s Wall time: 2min 5s

RahulBhalley commented 3 years ago

Seeing all amazing results you might not wanna bother about this machine from 2016. 😅 Anyways, I got the following information.

System: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports) Operating System: macOS Big Sur version 11.1 Processor: 2.9 GHz Dual-Core Intel Core i5 Memory: 8 GB 2133 MHz LPDDR3 Graphics: Intel Iris Graphics 550 1536 MB

2021-01-16 19:13:22.511385: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-16 19:13:23.496719: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 2.2993 - accuracy: 0.1251/Users/rahulbhalley/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 146s 288ms/step - batch: 234.0000 - size: 1.0000 - loss: 2.2993 - accuracy: 0.1251 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 2/12
469/469 [==============================] - 140s 291ms/step - batch: 234.0000 - size: 1.0000 - loss: 1.8151 - accuracy: 0.3670 - val_loss: 0.6209 - val_accuracy: 0.8441
Epoch 3/12
469/469 [==============================] - 140s 289ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.4052 - accuracy: 0.8984 - val_loss: 0.2491 - val_accuracy: 0.9445
Epoch 4/12
469/469 [==============================] - 158s 330ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1970 - accuracy: 0.9510 - val_loss: 0.1449 - val_accuracy: 0.9649
Epoch 5/12
469/469 [==============================] - 145s 301ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1394 - accuracy: 0.9653 - val_loss: 0.1099 - val_accuracy: 0.9695
Epoch 6/12
469/469 [==============================] - 152s 312ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1117 - accuracy: 0.9715 - val_loss: 0.0927 - val_accuracy: 0.9739
Epoch 7/12
469/469 [==============================] - 146s 300ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0933 - accuracy: 0.9766 - val_loss: 0.0828 - val_accuracy: 0.9787
Epoch 8/12
469/469 [==============================] - 180s 374ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0810 - accuracy: 0.9796 - val_loss: 0.0765 - val_accuracy: 0.9793
Epoch 9/12
469/469 [==============================] - 165s 342ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0717 - accuracy: 0.9817 - val_loss: 0.0718 - val_accuracy: 0.9811
Epoch 10/12
469/469 [==============================] - 140s 287ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0630 - accuracy: 0.9845 - val_loss: 0.0586 - val_accuracy: 0.9818
Epoch 11/12
469/469 [==============================] - 229s 480ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0570 - accuracy: 0.9859 - val_loss: 0.0727 - val_accuracy: 0.9817
Epoch 12/12
469/469 [==============================] - 146s 302ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0506 - accuracy: 0.9874 - val_loss: 0.0559 - val_accuracy: 0.9838

Key results:

dmitry-kabanov commented 3 years ago

iMac Pro 2017, 3 GHz 10-Core Intel Xeon W, 32 GB 2666 MHz DDR4, Radeon Pro Vega 64 16 GB

On GPU:

2021-01-23 15:21:50.079691: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:21:50.183928: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:21:52.322549: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1696 - accuracy: 0.9476/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 17s 27ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1696 - accuracy: 0.9476 - val_loss: 0.0472 - val_accuracy: 0.9850
Epoch 2/12
469/469 [==============================] - 14s 27ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0391 - val_accuracy: 0.9874
...
Epoch 11/12
469/469 [==============================] - 15s 28ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0042 - accuracy: 0.9985 - val_loss: 0.0474 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 15s 28ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0051 - accuracy: 0.9983 - val_loss: 0.0446 - val_accuracy: 0.9892

On CPU:

2021-01-23 15:25:55.524865: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:25:55.617573: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:25:56.065950: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1579 - accuracy: 0.9530/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 45s 93ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1579 - accuracy: 0.9530 - val_loss: 0.0545 - val_accuracy: 0.9820
Epoch 2/12
469/469 [==============================] - 45s 93ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0457 - accuracy: 0.9858 - val_loss: 0.0446 - val_accuracy: 0.9856
...
Epoch 11/12
469/469 [==============================] - 47s 96ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9978 - val_loss: 0.0409 - val_accuracy: 0.9894
Epoch 12/12
469/469 [==============================] - 47s 96ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9990 - val_loss: 0.0409 - val_accuracy: 0.9890

On pip-provided Tensorflow 2.4 (with removing two mlcompute lines from the script) it is twice faster:

2021-01-23 15:42:17.869355: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-23 15:42:17.869529: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:42:17.960406: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:42:18.386414: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9546/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv-tf-pip/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9546 - val_loss: 0.0459 - val_accuracy: 0.9854
Epoch 2/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0423 - accuracy: 0.9869 - val_loss: 0.0390 - val_accuracy: 0.9870
...
Epoch 11/12
469/469 [==============================] - 26s 51ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0505 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 25s 50ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9989 - val_loss: 0.0450 - val_accuracy: 0.9900
nikolaeff commented 3 years ago

This code is too hot! I think I just toasted GPU on my 16'' mbp by running this benchmark. Make sure your warranty not expired before experimenting.

Leon-lianglyu commented 3 years ago

MacBook Air (M1, 2020) 7 Core GPU Train on 469 steps, validate on 79 steps Epoch 1/12 467/469 [============================>.] - ETA: 0s - batch: 233.0000 - size: 1.0000 - loss: 0.1596 - accuracy: 0.9516/Users/leon/miniforge3/envs/tf-env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. warnings.warn('Model.state_updates will be removed in a future version. ' 469/469 [==============================] - 13s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1594 - accuracy: 0.9517 - val_loss: 0.0578 - val_accuracy: 0.9819 Epoch 2/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0430 - accuracy: 0.9871 - val_loss: 0.0362 - val_accuracy: 0.9879 Epoch 3/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0269 - accuracy: 0.9913 - val_loss: 0.0375 - val_accuracy: 0.9870 Epoch 4/12 469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0181 - accuracy: 0.9941 - val_loss: 0.0393 - val_accuracy: 0.9878 Epoch 5/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9956 - val_loss: 0.0347 - val_accuracy: 0.9890 Epoch 6/12 469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0098 - accuracy: 0.9967 - val_loss: 0.0356 - val_accuracy: 0.9890 Epoch 7/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9970 - val_loss: 0.0341 - val_accuracy: 0.9896 Epoch 8/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9984 - val_loss: 0.0402 - val_accuracy: 0.9893 Epoch 9/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9978 - val_loss: 0.0480 - val_accuracy: 0.9884 Epoch 10/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0058 - accuracy: 0.9980 - val_loss: 0.0435 - val_accuracy: 0.9877 Epoch 11/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9986 - val_loss: 0.0410 - val_accuracy: 0.9913 Epoch 12/12 469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9989 - val_loss: 0.0492 - val_accuracy: 0.9889

Process finished with exit code 0

harshamodini commented 3 years ago

this was one of the factors that helped me choose between 2 laptops that were priced the same 1.)MSI gf65 i7 10th gen with 6GB rtx2060 2.)apple MacBook air m1 base model

I ran the benchmark on both the device at the store and was surprised how capable apple m1 even tho it couldn't beat the MSI but it gave a respected result than the similar priced hp

anyway at last I end up buying MSI as it gave me more options

so here are my results: specs:i7 10th gen GPU:RTX 2060(6GB) it only utilized 40% only

Epoch 1/12 469/469 [==============================] - 7s 9ms/step - loss: 0.3589 - accuracy: 0.8936 - val_loss: 0.0471 - val_accuracy: 0.9855 Epoch 2/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0429 - accuracy: 0.9871 - val_loss: 0.0355 - val_accuracy: 0.9879 Epoch 3/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0258 - accuracy: 0.9918 - val_loss: 0.0318 - val_accuracy: 0.9894 Epoch 4/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0163 - accuracy: 0.9943 - val_loss: 0.0275 - val_accuracy: 0.9913 Epoch 5/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0117 - accuracy: 0.9962 - val_loss: 0.0349 - val_accuracy: 0.9894 Epoch 6/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0096 - accuracy: 0.9966 - val_loss: 0.0389 - val_accuracy: 0.9883 Epoch 7/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0510 - val_accuracy: 0.9869 Epoch 8/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0081 - accuracy: 0.9971 - val_loss: 0.0389 - val_accuracy: 0.9903 Epoch 9/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0033 - accuracy: 0.9989 - val_loss: 0.0456 - val_accuracy: 0.9895 Epoch 10/12 469/469 [==============================] - 4s 9ms/step - loss: 0.0053 - accuracy: 0.9983 - val_loss: 0.0410 - val_accuracy: 0.9903 Epoch 11/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0035 - accuracy: 0.9988 - val_loss: 0.0558 - val_accuracy: 0.9875 Epoch 12/12 469/469 [==============================] - 4s 8ms/step - loss: 0.0018 - accuracy: 0.9995 - val_loss: 0.0459 - val_accuracy: 0.9898

each epoch 4s each step 8m/s accuracy :0.9995 val_accuracy: 0.9898

thecaffeinedev commented 3 years ago

Tested on a MacBook Pro (13-inch, M1, 2020) with 8 GB RAM

Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1554 - accuracy: 0.9533
469/469 [==============================] - 14s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9534 - val_loss: 0.0524 - val_accuracy: 0.9836
Epoch 2/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9865 - val_loss: 0.0402 - val_accuracy: 0.9863
Epoch 3/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0263 - accuracy: 0.9919 - val_loss: 0.0316 - val_accuracy: 0.9901
Epoch 4/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0319 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0115 - accuracy: 0.9961 - val_loss: 0.0370 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0103 - accuracy: 0.9965 - val_loss: 0.0376 - val_accuracy: 0.9893
Epoch 7/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9973 - val_loss: 0.0345 - val_accuracy: 0.9892
Epoch 8/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9982 - val_loss: 0.0340 - val_accuracy: 0.9900
Epoch 9/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9976 - val_loss: 0.0442 - val_accuracy: 0.9888
Epoch 10/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0040 - accuracy: 0.9987 - val_loss: 0.0374 - val_accuracy: 0.9895
Epoch 11/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9984 - val_loss: 0.0370 - val_accuracy: 0.9906
Epoch 12/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0034 - 

accuracy: 0.9988 - val_loss: 0.0478 - val_accuracy: 0.9883
CPU times: user 2min 6s, sys: 30.9 s, total: 2min 37s
Wall time: 3min 2s