Depth in your paper and code

autonomousvision / differentiable_volumetric_rendering

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf

MIT License

794 stars 90 forks source link

Depth in your paper and code #38

Closed imhzxuan closed 3 years ago

imhzxuan commented 3 years ago

Hi, thanks for this great work and your other inspiring work in reconstruction!

I was checking your code and found that there is a use_ray_length_as_depth argument for the intersect_camera_rays_with_unit_cube function. If this is set to be true, the depths are set to the length of the ray, which does not seem to be correct to me when there is a non-zero angle between the camera z-axis and the ray.

Also in the paper, the ray is denoted as r(d) = r0 + dw, where d is surface depth as defined in your paper. If w here is a unit direction vector, again this is the length of the ray instead of the depth, right? It will be really helpful if you can elaborate a bit on this.

I guess it's my misunderstanding somewhere but I really don't know where I went wrong (sorry if this is stupid). Because when I check the real value of d, no matter whether I set "use_ray_length_as_depth argument" to be true or not I get very close values. Also, considering that you normalize the coordinates to a unit cube, is it normal to see ray length (or depth) ranging from 500~900 on DTU?

m-niemeyer commented 3 years ago

Hi @imhzxuan , thanks a lot for your interest in our project and your kind words!

You asked quite a few questions, and I try to address the main points in the following: 1.) Usually, if you are given a depth map, the values indicate the z-value in camera space, hence not the ray length as you correctly mentioned. 2.) In our code, we first want to find the depth values so that p_surface = ray_0 + d * ray_vector. 3.) To apply an L1 loss on the GT depth values, we can't re-use this "d"; we have to transform p_surface to the camera space, and then use the z-value of the point. 4.) The other option you have is to transform the GT pixel + depth value into the world coordinate system, and then apply an l2 loss there. 5.) Is it normal to get depth values around 500-900 for DTU? Yes! We didn't want to touch the camera matrices, however, as you correctly mention, we transform the world to the unit cube. However, this doesn't change the depth values, as everything is transformed, hence the values are still the same.

I hope this helps a little. Good luck with your research!

imhzxuan commented 3 years ago

Hi @m-niemeyer , thank you so much for your detailed reply. It helps a lot! I also encountered another issue when I tried to reproduce your result on single-view reconstruction with only multiview RGB supervision. Since I don't have large-memory cards like V100 you used, I rely on 2 TITAN X to use your original setting. But there seems to be a bug with your multi-gpu training:

At the beginning of training, DataParallel randomly crashes (not the 1st iter) in the backward propagation by throwing a runtime error in the C scatter backend: "Chunk size must be positive". Have you encountered this issue before with multiple GPUs?

My current solution is to train the model with only one card at the beginning. When it doesn't have enough memory, I will switch to multi-gpu training. This works but I don't feel it is the right way to go. The error seems to be saying there may be some null or crashed leaves that cannot be scattered during backward. But I checked the loss before they crashed, it's not nan nor any weird numbers. The input tensors are also with correct batch dimensions. So it looks quite weird to me - I'm doubting some of the operations cannot be naturally scattered to multiple cards for backprop. Do you have any thoughts on this? Like which part or method can cause this issue?

Thanks!

m-niemeyer commented 3 years ago

Hi @imhzxuan , we have actually just trained in a single GPU setting and never used multiple GPUs. For reducing the memory requirements, please have a look at this issue if you want to fit it on a smaller GPU.

imhzxuan commented 3 years ago

Hi @m-niemeyer , thanks for your reply! I have solved this issue with DistributedDataParallel. In this case, each GPU would handle a process separately so the issues disappear. I actually looked at your previous answer before, but I was assuming that changing those settings would hurt the performance. How do you think those four settings will influence the final performance vs memory consumption? (Also it would be super helpful if you can let us know the setting you use and how long you usually let the model run when you are experimenting with this model instead of getting the final best result. 2-5 weeks look scary to me haha)

Thank you so much for your patience and time!

m-niemeyer commented 3 years ago

I would suggest to play around in particular with the batch size and the decoder's hidden size. I would try to keep them equal or above 16 (batch size) and 128 (decoder's hidden size). In particular the latter results in a significant speed up / memory reduction, and I think the performance will not drop too much, even for the final results. While experimenting, You can even set the hidden size to 64 / 32 to get some results fast. I agree that you should only start such a long training with such big models once everything is in place.

Good luck with your research!

imhzxuan commented 3 years ago

Hi @m-niemeyer , I am looking into your codes these days. According to what I read, one way to avoid memory issues during evaluation is to set the max_points argument in the perform_ray_marching function smaller for only evaluation, is this correct?

Besides, my understanding is that this parameter (when set for training) will only influence the forward speed but not the final performance or training curve, right? If so, one confusing stuff is that, although I think decreasing the max_points will hurt the training speed but I actually haven't observed so. It just led to a decrease in memory but the time per iteration was almost the same. Not sure if this is because the forward overhead is hidden by massive computation. Did you observe similar phenomenon?

Thank you!

m-niemeyer commented 3 years ago

HI @imhzxuan , you are completely right, it will not change the loss at all - it just changes how many points are evaluated in parallel in the ray marching step. I personally haven't played with it too much, but I agree that the training speed should not drop too much when you still chose an acceptable value.