Custom object detection: Model training extremely slowly

Hello.

I'm trying to replicate the Hololens example to train a model to detect custom objects. To do that, I'm using Google Colab, with GPU activated (free plan).

I'm using the dataset provided by the documentation: https://github.com/OlafenwaMoses/ImageAI/blob/master/imageai/Detection/Custom/CUSTOMDETECTIONTRAINING.md Dataset --> https://github.com/OlafenwaMoses/ImageAI/releases/tag/essential-v4

I'm using these libraries:

 tensorflow-gpu==1.13.1
 keras==2.4.3 
 numpy==1.19.3
 pillow==7.0.0
 scipy==1.4.1
 h5py==2.10.0
 matplotlib==3.3.2
 opencv-python 
 keras-resnet==0.2.0
 imageai

This is my code:

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="Input/hololens")
trainer.setTrainConfig(object_names_array=["hololens"], batch_size=4, num_experiments=200, train_from_pretrained_model="Model/pretrained-yolov3.h5")
trainer.trainModel()

This is the output:

Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in  Input/hololens/json/detection_config.json
Evaluating over 59 samples taken from Input/hololens/validation
Training over 240 samples  given at Input/hololens/train
Training on:    ['hololens']
Training with Batch Size:  4
Number of Training Samples:  240
Number of Validation Samples:  59
Number of Experiments:  200
Training with transfer learning from pretrained Model
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
WARNING:tensorflow:`epsilon` argument is deprecated and will be removed, use `min_delta` instead.
WARNING:tensorflow:Model failed to serialize as JSON. Ignoring... Layer YoloLayer has arguments in `__init__` and therefore must override `get_config`.
Epoch 1/200
  1/480 [..............................] - ETA: 1:09:31 - loss: 119.8761 - yolo_layer_3_loss: 16.0161 - yolo_layer_4_loss: 30.1155 - yolo_layer_5_loss: 62.1695

And this would be the desired output, extracted from the documentation:

Using TensorFlow backend.
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in  hololens/json/detection_config.json
Training on:    ['hololens']
Training with Batch Size:  4
Number of Experiments:  200

Epoch 1/200
480/480 [==============================] - 395s 823ms/step - loss: 36.9000 - yolo_layer_1_loss: 3.2970 - yolo_layer_2_loss: 9.4923 - yolo_layer_3_loss: 24.1107 - val_loss: 15.6321 - val_yolo_layer_1_loss: 2.0275 - val_yolo_layer_2_loss: 6.4191 - val_yolo_layer_3_loss: 7.1856

Notice the enormous difference in time required for each epoch: 1h10m instead of 390s (~7mins).

In addition to Google Colab, I've tried to run the same code on a local machine using a Tesla V100 GPU, with a slightly better result, but still taking up to 30 mins per epoch.

Any ideas on what could be causing this?

OlafenwaMoses / ImageAI

Custom object detection: Model training extremely slowly #774