YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
BSD 2-Clause "Simplified" License
1.27k stars 92 forks source link

Ground Truth Depth Image not loading correctly in fine-tuning #135

Open Gear-dev-sudo opened 1 month ago

Gear-dev-sudo commented 1 month ago

Hi all, Thanks for the amazing work!

However, when I tried to fine-tune on your model, the ground truth depth which I had processed into 16-bit representations in png images with depth scaling set to my value, the fine-tuning process I observed in tensorboard {rgb,pred_depth, gt_depth} image Indicates that the gt depth miss information in the bottom part, which my originial data actually have.

Additionally, why are all of the images shown on tensorboard all cropped to a bottom part of the image? Is this normal?

I Largely followed this post for fine-tuning

105

This is the command I used :

 python3 mono/tools/train.py \
         /home/Metric3D/training/mono/configs/RAFTDecoder/vit.raft5.giant2.kitti.py \
        --use-tensorboard \
        --experiment_name test1 \
        --load-from /home/Metric3D/models/metric_depth_vit_giant2_800k.pth \
        --seed 42 \
        --launcher None
Gear-dev-sudo commented 1 month ago

https://drive.google.com/file/d/1lXu7CgFXXQt33QorW2PtR8haXr8ixRVD/view?usp=sharing The config I've used.

JUGGHM commented 3 weeks ago

The GT seems to be not complete. Have you double-checked this? Or have you ensured the depth_scale parameter is the same as you expected? For example, to obtain the depth in meters for KITTI, we divide the 16-bit value by a [depth_scale of 256]https://github.com/YvanYin/Metric3D/blob/277c3a1311da2d69816600f8e09c7b292146843b/training/mono/configs/_base_/datasets/kitti.py#L9. If so, maybe you can use the test mode to infer this single image: (1) First prepare a simple json file for this case like this (2) Then infer through the script with your prepared json file and corresponding models. The visualization results will tell you whether your GT data is corrupted.