Closed Xiang-cd closed 1 month ago
Thank you for your question! In our implementation, we actually use disparity instead of depth.
Thank you for your question! In our implementation, we actually use disparity instead of depth.
I know that the relationship between disparity (d), depth (Z), focal length (f), and baseline distance (B) is given by the formula: Z = fB/d
f: Focal length of the camera B: Baseline distance between the two cameras d: Disparity
Therefore, is it correct to convert disparity map to depth map by running the following code? Thank you in advance!
import cv2
d = cv2.imread(PATH_TO_DISPARITY_IMAGE, 0) #disparity map is 3-channel so I need to read the image as 1-channel.
depth = f * B / d
A reference for converting disparity to depth can be found here.
thank you for your great work! see the inference code, the ouput latent finally decode into disparity space, but the paper says: "Specifically, given a video depth xd, we first apply a normalization as in Ke et al. (2024) to ensure that depth values fall primarily within the VAE's input range of [−1, 1]" so what is the real?