dontLoveBugs / DORN_pytorch

PyTorch implementation of Deep Ordinal Regression Network for Monocular Depth Estimation
299 stars 67 forks source link

Kitti ground truth data #10

Closed tjqansthd closed 4 years ago

tjqansthd commented 5 years ago

Hi, thanks for your pytorch implementation!

I wonder that what ground truth data you choose when you train KITTI, sparse ground truth or dense ground truth? and if you use dense ground truth, how did you interpolate from sparse data to dense one? I only know the method of using colorization matlab function from nyu toolbox. But interpolated ground truth is not satisfying..

jiaxinxie97 commented 5 years ago

It seems that he uses sparse ground. The code can be found in tools/gen_kitti_dataset.py. DORN's author uses colorization matlab function from nyu toolbox to interpolate them into dense map, I haven't find the correct parameter to reproduce them. By the way, I think interplolation may be suitable for nyu whose depth taken by Kinnect and is dense but not KITTI.

jiaxinxie97 commented 5 years ago

Another detail to reminder you, if you use sparse depth to train, you should modify code in calculating ordloss, only calculate it in valid points.

tjqansthd commented 5 years ago

@jiaxinxie97 Thank you! You've solved all my curious things.

In my training, I had converted all sparse gt to dense to approach whole image pixels and calculate all of them using nyu tool box. And I gave a weight to each of the valid and invalid pixels.

But I have two more question...

  1. If I use sparse depth and only calculate valid points, are invalid points not included in the gradient calculation at all?

  2. And the output of that is sparse depth? or dense depth? In other words, If the depth papers used sparse depth as ground truth, I wonder that whether their result of depth estimation they inserted is the raw output extracted from the network, or the output after post processed such as interpolation. This problem kept bothering me.

jiaxinxie97 commented 5 years ago
  1. Yes, I only calculate them in valid points. But I don't know if it is right. All supervised methods I know haven't released their training code, so I don't know which kind of groundtruth they used. All I know is most of previous unsupervised work evaluated in sparse depth map in eigen split although they get dense output.
  2. If you use sparse depth as groundtruth, you will still get dense output. Most of them are raw output from the network, some method like monodepth also give post processed results. But they are unsupervised method.
tjqansthd commented 5 years ago

@jiaxinxie97 Thank you! I think the network may get some dense output even it is trained by sparse gt because position of valid pixels changes depending on images. I guess, in the end, almost every pixel of the graph will be calculated so we can get a dense depth. I will try a sparse gt.. thanks again.

jiaxinxie97 commented 5 years ago

For reference: https://github.com/mrharicot/monodepth/issues/166 If you do single image depth estimation, you can move to new benchmark. It has denser groundtruth than those only projected by one frame point clouds.

tjqansthd commented 5 years ago

@jiaxinxie97 I used sparse ground truth like this: 0000000012

It is annotated ground truth uint16 png, not velodyne_raw data. Is the groundtruth in new benchmark you mentioned different from this?

jiaxinxie97 commented 5 years ago

Yes,it is groundtruth from new Kitti benckmark.

tjqansthd commented 5 years ago

@jiaxinxie97 Oh, I used right thing. Is there any difference between ground truth from new Kitti benchmark and ground truth from generate_depth_map function by mrharicot/monodepth?

jahaniam commented 5 years ago

Hi. There is a huge difference.

in my paper https://arxiv.org/pdf/1905.07542.pdf I have evaluated lidar points using ground truth from new Kitti benchmark for eigen split. image you can see lidar points have huge errors. I have also shown if you use ground truth from new Kitti benchmark for training you get a huge performance boost (row one and three) image

The issue of inaccurate lidar points has been also discussed in this paper: http://vision.deis.unibo.it/~smatt/Papers/3DRW2018/Monogan.pdf

callme739 commented 4 years ago

@jiaxinxie97

Another detail to reminder you, if you use sparse depth to train, you should modify code in calculating ordloss, only calculate it in valid points.

How should I change it?

DongXingshuai commented 4 years ago

@jiaxinxie97 Thank you! You've solved all my curious things.

In my training, I had converted all sparse gt to dense to approach whole image pixels and calculate all of them using nyu tool box. And I gave a weight to each of the valid and invalid pixels.

But I have two more question...

  1. If I use sparse depth and only calculate valid points, are invalid points not included in the gradient calculation at all?
  2. And the output of that is sparse depth? or dense depth? In other words, If the depth papers used sparse depth as ground truth, I wonder that whether their result of depth estimation they inserted is the raw output extracted from the network, or the output after post processed such as interpolation. This problem kept bothering me.

Hello, can you please share the converted dense depth map? Thank you.