About training code - Githubissues

ShreyasSkandanS / DFuseNet

ITSC 2019 | This is the accompanying code repository for our paper "DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion" | PyTorch, Python 3

https://arxiv.org/abs/1902.00761

GNU General Public License v3.0

92 stars 18 forks source link

About training code #2

Closed WANGYINGYU closed 5 years ago

WANGYINGYU commented 5 years ago

Hello, I am very interested in the work of your paper, this is a great job. Are you considering opening your training code? I want to know some of the details. Thank you.

ShreyasSkandanS commented 5 years ago

Hi @wangyingyu thank you for the interest in our work. We are considering releasing the training code for this work once this work is accepted. As of now we are unsure of a timeline. If you are interested in setting up your own code to train this network, I am happy to answer any questions you may have. If you would prefer to wait it out, I'd suggest following this repository and we should have some updates over the next month or two.

yulinliutw commented 5 years ago

Hello, I am also interested in setting up the code to train this network, about the "image size" in training procedure, "Can you share your setting?" it seems directly use the image in the original dataset, the batch size will very small thus the model will hard to learn and also the memory will not enough. many thanks.

ShreyasSkandanS commented 5 years ago

Hi @yulinliutw, I'm happy to answer your question however this might be better as a separate thread.

For training on KITTI, we selected the maximum common resolution between the different runs that is (300, 1200).

yulinliutw commented 5 years ago

Thanks for your response, you are so nice, the answer always come in real time. so how about the nyu dataset, do you also selected the maximum common resolution to scale all the image? Besides, I found that the depth image in nyu dataset is quite different with the KITTI, I directly use the current pre processing code on the image, the result may not very good, I check the paper that mention the nyu data will use some sample process to make the depth image become sparse then the pre processing code can work (I check the theory of the code is use for filling lidar based depth map ) I am confuse that why we should make the depth become sparse to do the filling instead directly fill the depth on the original depth data? Is any reason that we should do the sample step on nyu depth image?

WANGYINGYU commented 5 years ago

@ShreyasSkandanS Thanks for your reply. I'm concerned about the following three details: 1) I am interested in the details of convolution operations in the network, such as kernel size and stride in different convolution operations; 2) I noticed that you used Ku et al. to fill the sparse depth. Does this method have a significant improvement over the method of filling with 0 values? 3) I noticed that SPP BLOCK was used at the end of both depth and RGB branches. Will there be a significant performance degradation if not used? Thank you very much.

ShreyasSkandanS commented 5 years ago

Hi @wangyingyu

You can find all the network specific details in the following folder: https://github.com/ShreyasSkandanS/DFuseNet/tree/master/models

DFuseNet.py is the overall network architecture.
fex_rgb.py is the feature extraction branch for the RGB image
fex_depth.py is the feature extraction branch for the Depth image

There is indeed an improvement over filling with 0s. There are some technical challenges when filling with zeros and using regular convolution operations. I recommend reading the following work to get a clearer understanding of the challenges:

Sparsity Invariant CNNs: https://arxiv.org/abs/1708.06500
You can also read section 5.1 of our arXiv paper where we briefly discuss this: https://arxiv.org/pdf/1902.00761.pdf

Yes, we noticed that the SPP blocks do improve performance!