YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
BSD 2-Clause "Simplified" License
1.43k stars 107 forks source link

Dense depth map output #166

Open Deng-King opened 1 month ago

Deng-King commented 1 month ago

Hi there, thank you for your contributions.

When I ran the demo code following the readme.md tutorial (three images in the folder ./data/kitti_demo/ of this repository) but I only got a relatively sparse depth prediction, how do I get dense depth estimates like it shows in the project page?

What I just got: 20240926-152444 0000000005_merge

What I'm expecting:

image
Deng-King commented 1 month ago

I got a good result by running test_vit.sh. I finally figured out that the second row of the visualization output is the Depth Map output, and the third row is GT. Maybe they were captured by Lidar, so it looks relatively sparse.

20240926-163915 0000000005_merge2

But now I cannot seem to get a depth map by using known camera intrinsics (like running test_kitti.sh and test_nyu.sh). I added the test code below line 263 in de_test.py

    ...

    pred_depths, outputs = get_prediction(
        model=model,
        input=torch.stack(rgb_inputs),  # Stack inputs for batch processing
        cam_model=None,
        pad_info=pads,
        scale_info=None,
        gt_depth=None,
        normalize_scale=None,
    )
    print(' -- pred --') # line 263
    print(pred_depths.shape)
    print(pred_depths.max())
    print(pred_depths.mean())
    print(pred_depths.max())

    for j, gt_depth in enumerate(gt_depths):
        normal_out = None

    ...

and the cmd output is:

[09/26 16:25:10 root]: Distributed training: False
[09/26 16:25:15 root]: Loading weight '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
[09/26 16:25:15 root]: Loading weight '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
[09/26 16:25:16 root]: Successfully loaded weight: '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
[09/26 16:25:16 root]: Successfully loaded weight: '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
  0%|                                                                                                                                                                                        | 0/3 [00:00<?, ?it/s]data/nyu_demo/rgb/rgb_00000.jpg
 -- pred --
torch.Size([1, 1, 480, 1216])
tensor(24.2716, device='cuda:0')
tensor(24.2192, device='cuda:0')
tensor(24.2716, device='cuda:0')
/media/deng/Data/Metric3D/mono/utils/do_test.py:322: RankWarning: Polyfit may be poorly conditioned
  pred_global, _ = align_scale_shift(pred_depth, gt_depth)
 33%|██████████████████████████████████████████████████████████▋                                                                                                                     | 1/3 [00:01<00:02,  1.31s/it]data/nyu_demo/rgb/rgb_00050.jpg
 -- pred --
torch.Size([1, 1, 480, 1216])
tensor(24.2716, device='cuda:0')
tensor(24.2192, device='cuda:0')
tensor(24.2716, device='cuda:0')
/media/deng/Data/Metric3D/mono/utils/do_test.py:322: RankWarning: Polyfit may be poorly conditioned
  pred_global, _ = align_scale_shift(pred_depth, gt_depth)
 67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                          | 2/3 [00:01<00:00,  1.62it/s]data/nyu_demo/rgb/rgb_00100.jpg
 -- pred --
torch.Size([1, 1, 480, 1216])
tensor(24.2716, device='cuda:0')
tensor(24.2196, device='cuda:0')
tensor(24.2716, device='cuda:0')
/media/deng/Data/Metric3D/mono/utils/do_test.py:322: RankWarning: Polyfit may be poorly conditioned
  pred_global, _ = align_scale_shift(pred_depth, gt_depth)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.90it/s]
missing gt_depth, only save visualizations...

which means the model gives a wrong prediction but IDK why 😭

and the visual is still the same

image