fizyr / keras-retinanet

Keras implementation of RetinaNet object detection.
Apache License 2.0
4.38k stars 1.96k forks source link

Reorder resize_image and preprocess_image calls for quicker computations #1427

Closed foghegehog closed 4 years ago

foghegehog commented 4 years ago

Hello!

We were trying to optimize a pipeline that uses keras_retinanet in our project and found out that preliminary steps before image processing can take quite a while. We are working with images of size 4000x3000 and invoking preprocess_image turns out to be relatively expensive operation. However, it is immediatly followed by resize_image that makes the image noticeable smaller. Taking into account that preprocess_image in both 'caffe' and 'tf' modes uses constant numbers only, without any image-dependent statistics, wouldn't it be better to swap the two functions' calls?

I've compared numerical results of the both call orders. To be honest, they are slightly different, but the difference is 1e-5 at maximum, though speed up increase is about x7: https://github.com/lacmus-foundation/lacmus-research/blob/master/resize_preprocess_order.ipynb

de-vri-es commented 4 years ago

This sounds logical to me. A minor numerical difference is to be expected. It might still be interesting to compare detection quality.

~I'm a bit confused by the old code though. It seems like it resizes the image and then immediately throws it away. How did that ever work in the first place?~ I was reading a single commit instead of the whole PR.

It's nice that you added an option rather than just changing the order, but in this case I'm not sure if it's actually worth it to keep the old order around.

@hgaiser: What do you think? Was there a specific reason we do preprocessing before scaling?

hgaiser commented 4 years ago

Nope, no particular reason for that. I agree that it is nice that you added an option, but I don't think it will matter much if we only use the resize -> preprocess order of things. Could you make that the only option?

foghegehog commented 4 years ago

Ok, I've done that. Initially I put the flag considering preprocessing that can use some image-specific statistics, but in this case it can also be better to preprocess the direct input rather than its non-resized version. By the way, this RetinaNet code sample also use resize-preprocess order: https://keras.io/examples/vision/retinanet/#generating-detections

de-vri-es commented 4 years ago

Looks good. Thanks for the PR and doing benchmarks!