Open AndyCao1125 opened 1 week ago
Hi, we pre-generate the depth data before training to speed up the overall process. As mentioned in the README, we use the following function to save the output from DepthAnything:
def save_raw_16bit(depth, fpath, height, width):
depth = F.interpolate(depth[None, None], (height, width), mode='bilinear', align_corners=False)[0, 0]
depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
depth = depth.cpu().numpy().astype(np.uint8)
colorized_depth = np.stack([depth, depth, depth], axis=-1)
depth_image = Image.fromarray(colorized_depth)
depth_image.save(fpath)
Thanks for your prompt reply!
Dear authors,
Thank you! It seems that this work adopts the depth images as input to a frozen image encoder that was pretrained on RGB data, and I wanna ask for some clarification on the preprocessing steps.
In particular, how should I normalize depth images? Since the encoder is frozen and designed for RGB inputs, is it important for the depth image values to be within a similar range as RGB images? Or is there a specific normalization strategy for depth data in this case?
Thanks!
Hi,
Yes, you’re right. We need to normalize the depth to match the RGB range, as shown in the save_raw_16bit
function from the previous response:
depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
Thanks for your great work!
I have a question: May I ask if the depth images are generated before training or generated by depth-anything during training?
Thank you!