Training with sparse depth

Zador-Pataki commented 1 year ago

Hi, I am trying to find a way to train in instant-ngp model using sparse depth. Reading this discussion https://github.com/NVlabs/instant-ngp/discussions/647, I have found that "For instant-ngp, you should provide depth images". In the same comment, it was mentioned "maybe you can take a look at DSNeRF which uses sparse point cloud from COLMAP". However, in DSNeRF, the depth data used is not presented as depth images, instead of an an array of sparse depth values (with corresponding pixel coordinates).

Is there currently a way to train instant-ngp with sparse-depth? In the depth image, do I need to set the elements with no depth to a certain value?

rockywind commented 1 year ago

Hi, How to generate the depth map if I have the synchronized point cloud?

Zador-Pataki commented 1 year ago

Simplest way would be to get the point coordinates in the camera frame. Then depth is just the z coordinate for each point

rockywind commented 1 year ago

Hi, thanks, I have a try!

mgupta70 commented 1 year ago

following

jc211 commented 1 year ago

Depth pixels with value 0 are not used in the supervision. So you could just create a depth map with your sparse values where unknown areas are set to 0.

Zador-Pataki commented 1 year ago

Hi, thanks for your response. It looks like this is working!

silver-obelisk commented 1 year ago

Hi, thanks for your response. It looks like this is working!

Hi, are you able to add the depth supervision just like DS-nerf? Could you please share you code 🙏

Zador-Pataki commented 1 year ago

Hi, honestly there is not much code to share. You need to create UINT16 depth images as discussed in a different thread. The only difference is you set the pixel values where there is no depth available to 0. Then you need to update the transforms.json file as if you were using standard dense depth. Hope this helps. The one thing you might want to pay attention to is the depth weight factor discussed in some threads that should be easy to find. I believe by default this factor is set to 0 even if you enable learning with depth. This might have changed, however. Also perhaps you need to increase the factor significantly above 1 to account for the sparsity (for me I ended up using 100+), however, I never checked the low-level code. If you verify this, I would be interested to know, although I have moved on from this project. Good luck!

silver-obelisk commented 1 year ago

Thank you so much! But i still have some question. Dose depth supervision from colmap point really useful? I mean is it performs good in you test , promote psnr or decrease floater?

I found Dsnerf use KL loss because they think colmap point have reproject erro, but ngp only use mse loss. So I worry it cannot work good in ngp

Besides, in dsnerf they set weight 0.1 and once I try to increase it the result be bad. The author told me the more image in the dataset the less weight we should set, because once we have enough images depth supervision is not necessary. I see you set 100+ in ngp, So I wonder how much images you used?

silver-obelisk commented 1 year ago

I read other thread and I gusses I know how to do it. But when I try to test it, I stucked in create UINT16 depth images. I have cameras.txt,images.txt,points3D.txt .But I donnot konw how to change them to depth images.If you can share your code that will be so helpful to me.🙏Thanks so much

Zador-Pataki commented 1 year ago

Yes, in my experiments it was useful (essential). It depends on the setup and scenario. I was working with dynamic lighting in unbounded scenes in an autonomous driving scenario with a forward-facing stereo camera. Yes colmap can be inacurate. However, I used significantly more than colmap. My reconstructed map was highly accurate in comparison to what would be achieved with the end-to-end colmap pipeline.

Perhaps in DS-nerf they account for the ratio of the number of keypoints with and without depth (could be something as simple as the final loss is the sum of the mean of rgb and mean of depth losses), and perhaps instant-ngp don't. Might be worth looking into.

I used over 1000 images in each setup. As I said it will depend on the scenario, I don't think one setup fits all, so worth experimenting and reading code. I see the authors point. If the DS-Nerf authors did not have reliable depth and their environment was well-lit, you'd want to work with the rgb frames more. In my case my depth values were more reliable than the rgb frames in my outdoor scenario.

Sorry, I don't have the capacity atm to collect the bits of code needed to make this work but I will give some pointers. Follow this link to interpret the colmap txt files https://colmap.github.io/format.html. They contain enough info to get frame-wise depths. Not too hard in my opinion; points3d contains the 3d points and images contains the camera poses (inverse to get poses in world frame), then simply calculate the position of the 3d points into the camera frame and the z value is your depth. (alternatively you can make use of colmaps txt file parser https://github.com/colmap/colmap/blob/dev/scripts/python/read_write_model.py). After you have depth, the other threads should easily get you to how to convert to the UINT16 format.

Hope this helps. If you need assistance, you should instead create a new issue here or in the colmap repo. If you have questions about how I constructed my sparse maps for more accurate depth I can still share, but with the rest, and specifically low-level implementation, I can't help more than this at the moment.

Good luck!

silver-obelisk commented 1 year ago

Yes, in my experiments it was useful (essential). It depends on the setup and scenario. I was working with dynamic lighting in unbounded scenes in an autonomous driving scenario with a forward-facing stereo camera. Yes colmap can be inacurate. However, I used significantly more than colmap. My reconstructed map was highly accurate in comparison to what would be achieved with the end-to-end colmap pipeline.

Perhaps in DS-nerf they account for the ratio of the number of keypoints with and without depth (could be something as simple as the final loss is the sum of the mean of rgb and mean of depth losses), and perhaps instant-ngp don't. Might be worth looking into.

I used over 1000 images in each setup. As I said it will depend on the scenario, I don't think one setup fits all, so worth experimenting and reading code. I see the authors point. If the DS-Nerf authors did not have reliable depth and their environment was well-lit, you'd want to work with the rgb frames more. In my case my depth values were more reliable than the rgb frames in my outdoor scenario.

Sorry, I don't have the capacity atm to collect the bits of code needed to make this work but I will give some pointers. Follow this link to interpret the colmap txt files https://colmap.github.io/format.html. They contain enough info to get frame-wise depths. Not too hard in my opinion; points3d contains the 3d points and images contains the camera poses (inverse to get poses in world frame), then simply calculate the position of the 3d points into the camera frame and the z value is your depth. (alternatively you can make use of colmaps txt file parser https://github.com/colmap/colmap/blob/dev/scripts/python/read_write_model.py). After you have depth, the other threads should easily get you to how to convert to the UINT16 format.

Hope this helps. If you need assistance, you should instead create a new issue here or in the colmap repo. If you have questions about how I constructed my sparse maps for more accurate depth I can still share, but with the rest, and specifically low-level implementation, I can't help more than this at the moment.

Good luck!

Thanks! I have already get sparse point depthmap from colmap in nerfstudio and now I want to use it in NGP. I wonder which loss you chose how to set interger depth scale?

NVlabs / instant-ngp

Training with sparse depth #1238