ClementPinard / SfmLearner-Pytorch

Pytorch version of SfmLearner from Tinghui Zhou et al.
MIT License
1.01k stars 224 forks source link

inference depth with same results #125

Closed liumingcun closed 3 years ago

liumingcun commented 3 years ago

Hello author, I have a problem. No matter what picture I use, the final depth image and disp image are exactly the same. I don’t know the reason. Could you please help me solve it? Thank you very much . 0000000160 0000000000 0000000000_depth 0000000000_disp 0000000001_depth 0000000001_disp

ClementPinard commented 3 years ago

Hi, the pretrained network is especially trained to work on KITTI images that were resized. As such, it will only work with images that are 416x128that share the same intrinsics as KITTI.

What you can do though is try to retrain the network with videos from your own camera, and then pretrained network will work for your image.

Bear in mind that the network is very specialized so if during thraining it only sees road pictures (as it is the cas in KITTI), the quality will be poor with different scenes

liumingcun commented 3 years ago

Hello author, thank you very much for your answer. However, I did use kitti data when I tested, but the results are exactly the same as the results I showed. image image

ClementPinard commented 3 years ago
liumingcun commented 3 years ago

Thank you author. I figured it out. The reason is I feed Dispnet normalize pixel values.

Rashfu commented 3 years ago

Thank you author. I figured it out. The reason is I feed Dispnet normalize pixel values.

Hi, @liumingcun: I met the same question as you. Should we feed Dispnet pixel values in [0, 255] ? I haven't read the code carefully, so I wonder if there is a need to modify the code in run_inference.py

Rashfu commented 3 years ago
  • Is the network the pretrained you can download in the readme or did you train it yourself ? Some hyperparameters (especially a too strong smooth loss) will get you this degenerate network that always output the same thing.
  • Are the images resized to 416x128 ?
  • Are you feeding DispNet normalize pixel values, i.i. colors are not in [0, 255] but in [-1, 1] ?

Hi, @ClementPinard : I download pretrained model from the link in README,and when I test on KITTI images, I just use run_inference.py(without any modification) to get the final depth image and disp image which are exactly the same as below. 0000000000 0000000000_depth 0000000010_disp

As you mentioned before:

Are you feeding DispNet normalize pixel values, i.i. colors are not in [0, 255] but in [-1, 1]

But I didn't change any code, just load some images. How could I fix it ? Thanks a lot~

ClementPinard commented 3 years ago

You picture proposition lacks a bit of structure. Can you try with a more urban scene ? I am surprised by the results as it looks very blurry but at the same time, you are inside of a vegetation tunnel for the lower part of the screen :joy:

What do you mean by "I didn't change any code, just load some image" what were the arguments you used in the inference_script ?

Rashfu commented 3 years ago

You picture proposition lacks a bit of structure. Can you try with a more urban scene ? I am surprised by the results as it looks very blurry but at the same time, you are inside of a vegetation tunnel for the lower part of the screen 😂

What do you mean by "I didn't change any code, just load some image" what were the arguments you used in the inference_script ?

I change some more urban scene but the results looks still the same. 0000000098 0000000098_depth 0000000098_disp

I save some urban scene imgs in ./imgs, pretrained model in ./weights. Then I seperately run the run_inference.py as below:

python3 run_inference.py --pretrained ./weights/dispnet_model_best.pth.tar --dataset-dir ./imgs --output-dir ./imgs_out/disp --output-disp
python3 run_inference.py --pretrained ./weights/dispnet_model_best.pth.tar --dataset-dir ./imgs --output-dir ./imgs_out/depth --output-depth

I just feel it strange to generate invariant depth and disparity. 😔 Any mistakes I have made? 😟

ClementPinard commented 3 years ago

Hey, sorry all about everything, there is a bug in the inference_script . Indeed, I changed the image loading function to use imageio.imread and didn't test thoroughly enough. As a consequence I didn't see that output of imageio.imread was in [0,1] and not [0, 255].

I changed the Line 66 from

        tensor_img = ((tensor_img/255 - 0.5)/0.5).to(device)

to

        tensor_img = ((tensor_img - 0.5)/0.5).to(device)

And it works much better. A fix is coming, but you can already directly applied the change I mentionned to you.

Clément

Rashfu commented 3 years ago

Hey, sorry all about everything, there is a bug in the inference_script . Indeed, I changed the image loading function to use imageio.imread and didn't test thoroughly enough. As a consequence I didn't see that output of imageio.imread was in [0,1] and not [0, 255].

I changed the Line 66 from

        tensor_img = ((tensor_img/255 - 0.5)/0.5).to(device)

to

        tensor_img = ((tensor_img - 0.5)/0.5).to(device)

And it works much better. A fix is coming, but you can already directly applied the change I mentionned to you.

Clément

That works fine! Thanks a lot!

liumingcun commented 3 years ago

Thank you author. I figured it out. The reason is I feed Dispnet normalize pixel values.

Hi, @liumingcun: I met the same question as you. Should we feed Dispnet pixel values in [0, 255] ? I haven't read the code carefully, so I wonder if there is a need to modify the code in run_inference.py

Hello, my WeChat is 879997125, I think we can exchange related questions

Rashfu commented 3 years ago

Thank you author. I figured it out. The reason is I feed Dispnet normalize pixel values.

Hi, @liumingcun: I met the same question as you. Should we feed Dispnet pixel values in [0, 255] ? I haven't read the code carefully, so I wonder if there is a need to modify the code in run_inference.py

Hello, my WeChat is 879997125, I think we can exchange related questions

Yeah ,that's good idea~Check your WeChat:smile:

Rashfu commented 3 years ago

Hey, sorry all about everything, there is a bug in the inference_script . Indeed, I changed the image loading function to use imageio.imread and didn't test thoroughly enough. As a consequence I didn't see that output of imageio.imread was in [0,1] and not [0, 255]. I changed the Line 66 from

        tensor_img = ((tensor_img/255 - 0.5)/0.5).to(device)

to

        tensor_img = ((tensor_img - 0.5)/0.5).to(device)

And it works much better. A fix is coming, but you can already directly applied the change I mentionned to you. Clément

That works fine! Thanks a lot!

Hi, @ClementPinard : The output of imageio.imread was still in [0, 255]. The bug is in skimage.transform.resize: if the dtype of input img is uint8 then the resize function will normalize the value in [0, 1] automatically, while infloat32 the resize function just do the resize.