apple / ml-depth-pro

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Other
3.76k stars 254 forks source link

Should the image input to the neural network be dedistorted? #33

Open vtasStu opened 1 month ago

vtasStu commented 1 month ago

Thank you for your great work. I have a few questions.

  1. Should the image be distorted?
  2. The code inverse_depth = canonical_inverse_depth * (W / f_px) is a bit confused me. The canonical_inverse_depthcorresponds to the resized image, while the W / f_px corresponds to the original image.
xiaodongww commented 1 month ago

Hi,I also have the same quesiton. By the way, intrinsics usuall have four parameters (fx, fx, cx, cy), is only f_px enough to produce metric depth? do you have any idea? @vtasStu

JVPC0D3R commented 1 month ago

@xiaodongww I used both the fx parameter and a mean between fx and fy. None of them gave me a proper depth map for my custom images.

# horizontal focal length
f_px= torch.tensor(fx)
image = transform(frame)
# mean focal length
f_px= torch.tensor((fx + fy) /2)
image = transform(frame)

After processing the image and getting the metric depth map I ended up with the map below. I tested the same code with several samples and I only got shapes of the furthest elements represented in the map.

image

JVPC0D3R commented 1 month ago

The depth map is correct, the outputs go really far away in this example (up to 10km). In order to visualize the output in a shorter range you can use depth = depth.clip(0,200)

Regarding the original question, I recomend undistort your image if possible.