Closed NotAnyMike closed 3 years ago
Hmm, not sure. First guess: are you resizing the images before feeding them to the depth network? To which resolution?
For the gif I shared, the point cloud is part of the ones you provided here, I just found the same image in KITTI (and resized it, but only the rgb original image) and extrapolated the pixels with the depth as (x,y,1) * depth (where x,y are pixel coordinates) so I did not feed any image to the depth network
You need to multiply by K^-1 on the left.
When you multiply by K you are scaling the whole 3d scene, so I am not so sure if pre-multiplying K^-1 will have a significant difference. I am bit busy right now but I will try it out, can someone point me to where K is defined exactly for the kitti (should I use P_rect_00) ?
Seems like the pointclouds themselves are off, I will update these files to make sure. In the meantime, can you try evaluating our pretrained models and using the predicted depth maps to produce the pointclouds? I'm sure those are working fine.
Similar results using the pretrained models (Using ./checkpoints/PackNet01_MR_semisup_CStoK.ckpt check below). I checked the code I use to generate the point cloud with another model and it works as expected. I haven't had time to do the K^-1 thing yet, but I don't think will change things
Solved, the dimensions of depth and color were mixed, totally my fault, sharing some results. BTW good work guys, this model is pretty interesting! Wondering how to improve it
This is great, I'm glad you managed to get it working!
Has anyone tried to learn the camera intrinsics instead of using the ones provided ? I assume that that could get better results, since the intrinsics might be slightly off at times...
He have a paper on learning arbitrary camera models, it has been accepted for an oral presentation at 3DV: https://arxiv.org/abs/2008.06630 We will be adding code for that paper here soon, so stay tuned!
That's awesome ! I am really interested in how these depth estimation approaches handle dynamic objects. The work I've seen till now tries to segment out the moving objects so that the scene is static. I am wondering if that is the best approach as the net would never learn what the depth of a dynamic object is, and what happens when there are no static examples of a dynamic object in the dataset...
Are you guys thinking along those lines as well ? Or does using semantic segmentation as guidance to mask out these dynamic objects is "good enough" ?
It is definitely not good enough, we are actively looking for ways to model dynamic objects, and not only mask them out. Optical/scene flow seems like a good direction, but that's arguably a harder problem than depth estimation, so it's hard to integrate. Do you have any ideas?
@NotAnyMike, which software do you use for pointcloud visualization?
Hi, can you please share a script\code snippet that generates point cloud visualization like the one shown above?
@e2r-htz if I am not wrong it is Blender.
@e2r-htz that was Blender. @iariav, are you still interested in the visualisation code?
We have open-sourced our visualization tool, it's here: https://github.com/tri-ml/camviz
Hello there!
Thank you for the packnet paper, it has been very interesting to read! I have one issue, when I try to visualize the point cloud of the Precomputed Depth Maps you have in the readme, more exactly the
PackNet, Self-Supervised Scale-Aware, 192x640, CS → K | eigen_test_files
I get a noisy point cloud. The same happens if I generate a depth map for an image in themedia/test
folder using any pretrained weights. You can see a video of the point cloud below.Is this suppose to happen? Am I missing something?
Thanks again for the paper Kind regards, Mike