elliottwu / DeepHDR

This is the implementation for Deep High Dynamic Range Imaging with Large Foreground Motions (ECCV'18)
MIT License
186 stars 38 forks source link

COOL AND EASY, but how to SPEED UP? #9

Closed TigerStone93 closed 5 years ago

TigerStone93 commented 5 years ago

First, thank you for your great contribution to the HDR.

Your code is tremendously cool and very clear so it is easy to use even like me.

Comparing with other HDR codes based on deep learning, I think distinctive feature of yours is you use 3 inputs with different exposure so output image can easily reconstitute almost-white and almost-black area of input images.

However, I wonder how to speed up the process.

With 3 800x600 jpg inputs, it takes 0.39s per one output. (i7-7700 3.60Hz, GTX1080 8G) (Only 5% of my GPU is used)

I want at least 5 FPS

For poor newbie, please give me some advice.

elliottwu commented 5 years ago

Hi, The short answer probably is "buy a better GPU!" Just kidding :P

In general, you can improve the speed in two ways.

(1) use a smaller architecture and retrain it. Regarding the current architecture, you could try to: a. reduce the channel size in the (de-)convolution layers; b. remove the seperate encoders for 3 inputs and use just one instead; c. reduce the number of residual blocks. The effect of these three aspects on running time probably decreases respectively.

(2) use off-the-shelf network trimming tools. There are many network trimming/pruning techniques that could be directly applied to off-the-shelf trained models, potentially without finetuning. TF has an official tool dedicated to model optimization: https://www.tensorflow.org/model_optimization. This tf.contrib looks quite reliable too: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning. There are many other tutorials on this topic which you can easily find.

Good luck. x2 speed-up seems easily achievable.

TigerStone93 commented 5 years ago

@elliottwu

Sorry for bothering you, but I have two more question. If I reduce input images from 3 to 1 at the encoder, then how can I expand the dynamic range? Maybe the range become narrower than when I use 3 inputs. Is that right?

Thank you for your kind response with tutorial links. I am worried a little, but I will try to modify your well-structured network.

elliottwu commented 5 years ago

I think there is some misunderstanding. What I meant was to use one single encoder for all 3 inputs, by stacking the inputs along channel dimension. This way, you can save some computation for separate encoders which are used in the current model.

TigerStone93 commented 5 years ago

@elliottwu

Totally understood. Thank you so much!