Open obriensystems opened 7 months ago
On 4 L4s or 3 RTX-4500/4500/4000
https://github.com/tensorflow/tensorflow/issues/41724#issuecomment-665996179
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice()) parallel_model.fit(x_train, y_train, epochs=25, batch_size=2048)
|-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 | | N/A 80C P0 62W / 72W | 21002MiB / 23034MiB | 58% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA L4 Off | 00000000:00:04.0 Off | 0 | | N/A 78C P0 67W / 72W | 20994MiB / 23034MiB | 46% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA L4 Off | 00000000:00:05.0 Off | 0 | | N/A 76C P0 67W / 72W | 20998MiB / 23034MiB | 55% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA L4 Off | 00000000:00:06.0 Off | 0 | | N/A 75C P0 51W / 72W | 21002MiB / 23034MiB | 55% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 40306 C python 20990MiB | | 1 N/A N/A 40306 C python 20982MiB | | 2 N/A N/A 40306 C python 20986MiB | | 3 N/A N/A 40306 C python 20990MiB | +---------------------------------------------------------------------------------------+ Epoch 24/25 25/25 [==============================] - 3s 105ms/step - loss: 0.2089 - accuracy: 0.9445 Epoch 25/25 25/25 [==============================] - 3s 105ms/step - loss: 0.1559 - accuracy: 0.9592
Switch Strategy - to cross_device_ops - working for more than 2 GPUs
On 4 L4s or 3 RTX-4500/4500/4000
https://github.com/tensorflow/tensorflow/issues/41724#issuecomment-665996179