Open obriensystems opened 11 months ago
On Dual RTX-4090 - full parallelization
import tensorflow as tf
#import keras
#from keras.utils import multi_gpu_model
#import keras.backend as k
#https://github.com/microsoft/tensorflow-directml/issues/352
# https://www.tensorflow.org/guide/distributed_training
#
# https://www.tensorflow.org/tutorials/distribute/keras
# https://keras.io/guides/distributed_training/
#strategy = tf.distribute.MirroredStrategy()
#print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
#NUM_GPUS = 2
#strategy = tf.contrib.distribute.MirroredStrategy()#num_gpus=NUM_GPUS)
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
# https://learn.microsoft.com/en-us/windows/ai/directml/gpu-faq
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
with strategy.scope():
# https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50
# https://keras.io/api/models/model/
parallel_model = tf.keras.applications.ResNet50(
#model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
# https://saturncloud.io/blog/how-to-do-multigpu-training-with-keras/
#parallel_model = multi_gpu_model(model, gpus=2)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
# https://keras.io/api/models/model_training_apis/
parallel_model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
parallel_model.fit(x_train, y_train, epochs=10, batch_size=256)#5120)#7168)#7168)
Your kernel may have been built without NUMA support.
2023-10-04 02:13:36.322066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21286 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
2023-10-04 02:13:36.322333: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-04 02:13:36.322355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 21286 MB memory: -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:02:00.0, compute capability: 8.9
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
169001437/169001437 [==============================] - 4s 0us/step
Epoch 1/10
2023-10-04 02:14:01.564898: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2023-10-04 02:14:01.961456: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2023-10-04 02:14:04.719269: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa5c69fab80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-04 02:14:04.719297: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2023-10-04 02:14:04.719300: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2023-10-04 02:14:04.722981: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-10-04 02:14:04.778031: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
196/196 [==============================] - 38s 57ms/step - loss: 4.3436 - accuracy: 0.0889
Epoch 2/10
196/196 [==============================] - 10s 52ms/step - loss: 3.6832 - accuracy: 0.1765
Epoch 3/10
196/196 [==============================] - 10s 51ms/step - loss: 3.8586 - accuracy: 0.1447
Epoch 4/10
196/196 [==============================] - 10s 52ms/step - loss: 3.4894 - accuracy: 0.1989
Epoch 5/10
196/196 [==============================] - 10s 52ms/step - loss: 3.2204 - accuracy: 0.2489
Epoch 6/10
196/196 [==============================] - 10s 51ms/step - loss: 3.0356 - accuracy: 0.2802
Epoch 7/10
196/196 [==============================] - 10s 52ms/step - loss: 2.9648 - accuracy: 0.2936
Epoch 8/10
196/196 [==============================] - 10s 52ms/step - loss: 2.6994 - accuracy: 0.3399
Epoch 9/10
196/196 [==============================] - 10s 52ms/step - loss: 2.4836 - accuracy: 0.3792
Epoch 10/10
196/196 [==============================] - 10s 52ms/step - loss: 2.3146 - accuracy: 0.4120
From #13
https://saturncloud.io/blog/how-to-do-multigpu-training-with-keras/ around multi_gpu_model