Slower training comparing to 0.4.1

Hello

I am migrating from keras-retinanet 0.4.1 (yeah, I know) to 0.5.1. Release of this version just can't start with the same problem as described in #1171. But solution there is clumsy (and a bit outdated now anyway as far as I understand). So I tried to use the latest master branch code and there is no errors, I can start training successfully. But, strangely, training time on the same machine with the same data has become slower comparing to 0.4.1. I ran the command

retinanet-train.exe --batch-size 2 --backbone resnet50 --epochs 1 --steps 2212 --random-transform --image-min-side 800 csv .\data\prepared\train.csv .\data\raw\classes.csv

with both versions and here are the results.

GPU-Z screenshot during 0.4.1 training with tensorflow 1.10.0 and CUDA 9.0: keras-retinanet0 4 1

GPU-Z screenshot during master branch training with tensorflow 2.3.0 and CUDA 10.1: keras-retinanet_master

OS: Windows 10 Python: 3.6.8 Graphics card: Nvidia GeForce GTX 1080 Ti

You can clearly see that 0.4.1 training utilize GPU more effectively (look at the GPU load and Board power draw graphs). One epoch of training is taking 1131 seconds (511 ms/step) with the master branch code and 991 seconds (448 ms/step) with 0.4.1 version. It's more than a 14% increase at training time. Is it normal behavior or am I missing something? I imagined with all the optimization fixes going on in the project training would be faster or take the same time at least. If it's the consequences of using non-release code, could you please address the problem from #1171 then?

fizyr / keras-retinanet

Slower training comparing to 0.4.1 #1450