TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

About the warp_ref_image #42

Closed maskedmeerkat closed 4 years ago

maskedmeerkat commented 4 years ago

Hi Vitor,

in https://github.com/TRI-ML/packnet-sfm/blob/f824ffceba46ae1c621e1bf22a35634d8b39207c/packnet_sfm/losses/multiview_photometric_loss.py#L156-L157 you only provide one scale to the camera's scaling function. Wouldn't this mean that, in case my image isn't scaled equaly in x and y direction, the camera intrinsic matrix is scaled incorrectly or is this addressed somewhere else?

Thanks for your time and patience for explaining your code to all of us ^^

VitorGuizilini-TRI commented 4 years ago

The original scaling from full resolution to input resolution is done in:

https://github.com/TRI-ML/packnet-sfm/blob/f824ffceba46ae1c621e1bf22a35634d8b39207c/packnet_sfm/datasets/augmentations.py#L54

which uses two scales. The line you mentioned scales from input resolution to the intermediate inverse depth resolutions, and these are smaller by factors of 2 in both dimensions, so using a single scale is correct. At least that's my understanding, does that make sense?

In any case, I really appreciate that you are going this deep into our codebase, please let me know if you find any other suspicious things, or if there any improvements we could try to make our numbers even better!

maskedmeerkat commented 4 years ago

Ah, thanks, that makes sense. I am just really trying to get the self supervision for NuScenes to work. But I am more and more out of ideas.

VitorGuizilini-TRI commented 4 years ago

Have you tried starting from a pretrained model from another dataset? That might give a better starting point for the depth features, so they don't diverge.

maskedmeerkat commented 4 years ago

Yes, and my semisupervised learning already works okayish, as can be seen in the image. Even ground plane removal works okay. bevImg_scene0001_sample0017 But when I used that semisupervised, fine tuned model and start selfsupervised training, it diverges immediately (4 GPUs, batch size 8, lr 1e-5). evalImg_ep00088 The input images are all correct. Thus, I am currently checking the camera intrinsic matrices as the last point of failure I could identify.

maskedmeerkat commented 4 years ago

Another thing I will try is visualize the loss masks and maybe even the warping results themselves to see whether that is working as expected.

VitorGuizilini-TRI commented 4 years ago

Did you manage to get it working?

youngskkim commented 4 years ago

hi @maskedmeerkat, I'm trying to get the depth map on nuScenes dataset too.

My approach is to use fully-supervised method(DORN) after generating a dense depth map using depth completion. I wonder if you evaluate the depth prediction performance quantitatively using metrics. (RMSE, theta...)

Here are results that I got from a front view image, using sparse GT. Abs Rel / Sq Rel / RMSE / RMSE_log / th_1.25 / th_1.252 / th_1.253 PackNet 0.187 / 1.852 / 7.636 / 0.289 / 0.742 DORN 0.132 / 1.598 / 6.944 / 0.233 / 0.839 / 0.938 / 0.972

It will be very helpful if you share the evaluation results of the semi-supervised method so I can try the other approach if yours is better.

Thank you!

maskedmeerkat commented 4 years ago

Sadly, the above result is the best I can achieve. What I tried was

warpedImg_1595229618

The best abs Rel I could achieve was 0.149 and you can see the results in my previous comment.

Do you know whether there should be so many pixels auto masked out? Is that to be expected in the beginning of training and becoming better over time?

maskedmeerkat commented 4 years ago

I also tried to use the 384x640 resolution pretrained models, since you said something about to low quality or noise could affect the training. So I am unsure whether the stretching of the images due to reshaping has some influence on the accuracy...

VitorGuizilini-TRI commented 4 years ago

Soon I am planning to spend some time refactoring the photometric loss, to try some new ideas, and will be able to introspect that some more. The auto-masking removes pixels with unwarped photometric loss smaller than warped photometric loss, so there might be something wrong with the pose network, that is not learning properly. Can you try turning auto-masking off during training?

maskedmeerkat commented 4 years ago

I also tried that before with no real benefit. And I also belief, that the error is more on my side than on your implementation.

Hmm, could you give me a hint on how to evaluate that my camera intrinsics are provided in a way that fits your framework?

maskedmeerkat commented 4 years ago

Hmm, anyways won't be able to work on improving depth estimation at the current time in my project. Maybe, in case I find some more time towards the end (highly unlikely XD), I can try some more thiings.