Distributed training on multiple GPUs

Has anyone had any luck speeding up training using tf.distribute.Strategy for multiple GPUs? My training follows the general pipeline given in the documentation here: https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html

To this code, I have added:

strategy = tf.distribute.MirroredStrategy()

And placed the detector call (and loading of previous weights I want to start from) under strategy.scope():

with strategy.scope():
    detector = keras_ocr.detection.Detector(weights='clovaai_general')
    detector.model.load_weights('detector_2022-06-15T20:36:08.735431.h5')

I am currently not seeing any time improvements to training with this approach. Any suggestions?

faustomorales / keras-ocr

Distributed training on multiple GPUs #213