gabrieleilertsen / hdrcnn

HDR image reconstruction from a single exposure using deep CNNs
https://computergraphics.on.liu.se/hdrcnn/
BSD 3-Clause "New" or "Revised" License
508 stars 101 forks source link

Training HDR image dataset #24

Closed zkk0911 closed 5 years ago

zkk0911 commented 5 years ago

HI gabrieleilertsen, it is excited to discover your perfect reconstruction. I want to train the same model using the your training codes and datasets mentioned in paper. But I haven't got the idea how to create training data. Paper describes that there is a total of 1121 HDR images and 67 HDR video sequences, and finally got ∼ 3700 HDR images,I wonder how can I get this 3700 HDR images dataset? is there scripts to use or where can i download straightforwardly. Thanks

gabrieleilertsen commented 5 years ago

Hi. Please see this previous issue. Details on how the training dataset is constructed is in the paper. For example, every 10th frame of the HDR video sequences were used for the final training dataset.

zkk0911 commented 5 years ago

ok,Thanks for your reply,I will try.
The images of Places database you select are compressed 256*256 images , you drop the images that have less than 50 maximum pixels(the maximum pixels value is 255), and then get ~600K images, this smaller dataset is just the PATH_TO_LDR_DATABASE in pre-training stage.

That is my understanding. I don’t know if there is something wrong

zkk0911 commented 5 years ago

The model input height and width both are 224, but I haven't found the resize operation in hdrcnn_train.py and virtualcamera.cpp files. so, it's needed to do resize for selected LDR images manually(255x255->224x224), then the ~600K images can be used for training finally. please let me know if there is misunderstanding, thanks

zkk0911 commented 5 years ago

HI gabrieleilertsen,why is the HDR decoder operated in the log domain when reconstructing an HDR image?

gabrieleilertsen commented 5 years ago

Yes, the Places images are pre-processed offline, according to the descriptions in the paper. Non-saturated images are selected, and the resolution is changed to 224x224.

The decoder operates on log values, which most often is better suited for image processing than linear values. LDR images are compressed by a camera curve or gamma correction, but this doesn't extend very well to pixels > 1, so the log domain is a better option. In the end, the loss function is also on log values, which corresponds better to the perceived error. Otherwise, there would be too much emphasises on the large pixel values.

zkk0911 commented 5 years ago

I see what you mean, thanks. Another confusion about the inverse camera curve. when training in skip-connections layer, it is a gamma function of f(x) = x*x, but it seems like a more complex format when converting the input ldr images to hdr. As follows:

tf.pow(tf.scalar_mul(1.0 / 255, skip_layer.outputs), 2.0)

np.power(np.divide(0.6 * xx, np.maximum(1.6 - xx, 1e-10)), 1.0 / 0.9)

And go a step,how did you define these parameters 0.6/1.6/0.9? I haven't find the rule to get them,or maybe they are experimental values.

gabrieleilertsen commented 5 years ago

For the skip-connections, the gamma curve with gamma=2, i.e. f(x)=x^2, is used for simplicity and to avoid numerical problems. For linearization, a sigmoidal camera curve is used, with parameters fitted to a dataset of camera curves. Please consider reading the paper for all the details, and more specifically Appendix A.3, Fig. 19, and Eq. 10, which describes the camera curve.

zkk0911 commented 5 years ago

Thanks @gabrieleilertsen,I will continue to understand the details