OlafenwaMoses / ImageAI

A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities
https://www.genxr.co/#products
MIT License
8.49k stars 2.18k forks source link

Custom Object Detector Training Using Little GPU After Recent ImageAI Update #621

Open TheBeastCoding opened 3 years ago

TheBeastCoding commented 3 years ago

im3 img2 img1

Img1 shows what my current model training time is per epoch (roughly 6 hours per epoch for 750 images). Img2 shows the current GPU usage on Google Colab. Img3 shows old models training logs on the same image dataset (about 30 minutes per epoch for 750 images). GPU usage was much higher (about 15 GB) during this training cycle.

I did notice that I got an error this morning when I tried to run the same code from several days ago (I previously used TensorFlow 1.13). Perhaps a recent patch has impacted GPU usage when I used the new recommended imports: TF 2.4.

I am using the following imports as recommended by the documentation for my current model training: !pip install tensorflow==2.4.0 tensorflow-gpu==2.4.0 keras==2.4.3 numpy==1.19.3 pillow==7.0.0 scipy==1.4.1 h5py==2.10.0 matplotlib==3.3.2 opencv-python keras-resnet==0.2.0 !pip install imageai --upgrade

OlafenwaMoses commented 3 years ago

This is unusual. After installing TF 2.4.0, restart the runtime and run the training again.

TheBeastCoding commented 3 years ago

This is unusual. After installing TF 2.4.0, restart the runtime and run the training again.

I restarted the runtime and noticed no changes in performance. I changed my dataset to a small dataset to around 100 images and it is running at 30mins/epoch, which is much longer than my previous experience (5mins/epoch for small datasets). Also, I am using the Google Colab Pro, so the GPU access should not be a problem. Have you run any object detection training on Google Colab recently after this update and noticed longer than usual training times or little GPU usage?

TheBeastCoding commented 3 years ago

This is unusual. After installing TF 2.4.0, restart the runtime and run the training again.

UPDATE: Reverting back to TF-GPU==1.13.1, Keras==2.2.4 and imageai==2.1.0 FIXED the issue. I now am back to full GPU usage. There is something wrong with the updated TF library that is limiting GPU usage in your latest patch.

gpu usage

StevenMapes commented 3 years ago

I wonder if this is what's causing my troubles on a local build where the GPU is not being used even with trainer.setGpuUsage(1) being called whereas if I call that on Colab then the GPU it used and the time to learn drops as you'd expect.