YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
BSD 2-Clause "Simplified" License
1.38k stars 103 forks source link

[Q] Replicate Dense-SLAM Mapping #49

Closed KyanainsGate closed 7 months ago

KyanainsGate commented 9 months ago

Hello, excellent work!! By the way, may I please ask you about the Dense-SLAM experiment in the paper?

To my understanding, the weight for the "Droid" part is an off-the-shelf model that is trained on a TartanAir (released here: https://github.com/princeton-vl/DROID-SLAM), and the other hyperparameter is following the demo configuration of that repository. Is this correct?

In addition, I'm just curious about the input image: What the size of the image is fed into the Droid module? Or following one of the the cited papers, reshaped into a bit smaller size, like [192,640]?

Thank you very much for your support!

JUGGHM commented 9 months ago

Hello, excellent work!! By the way, may I please ask you about the Dense-SLAM experiment in the paper?

To my understanding, the weight for the "Droid" part is an off-the-shelf model that is trained on a TartanAir (released here: https://github.com/princeton-vl/DROID-SLAM), and the other hyperparameter is following the demo configuration of that repository. Is this correct?

In addition, I'm just curious about the input image: What the size of the image is fed into the Droid module? Or following one of the the cited papers, reshaped into a bit smaller size, like [192,640]?

Thank you very much for your support!

Thanks for your question! I think @ckLibra should know this. Could you please help illustrate a little bit on this?

ckLibra commented 9 months ago

Hello, excellent work!! By the way, may I please ask you about the Dense-SLAM experiment in the paper?

To my understanding, the weight for the "Droid" part is an off-the-shelf model that is trained on a TartanAir (released here: https://github.com/princeton-vl/DROID-SLAM), and the other hyperparameter is following the demo configuration of that repository. Is this correct?

In addition, I'm just curious about the input image: What the size of the image is fed into the Droid module? Or following one of the the cited papers, reshaped into a bit smaller size, like [192,640]?

Thank you very much for your support!

Yeah, DROID's pretrained model in TartanAir is used. The image is rescaled to a smaller size, but not that much as [192, 640]. It depends on your GPU memory.

KyanainsGate commented 9 months ago

I appreciate all your kind and quick support!

If so, @ckLibra, could you share with us the configuration (especially, resolution) to get the number of Table 5 in the paper (e.g. Droid+Ours shows t_rel=1.63, etc.)?
[192, 640] for it showed a completely worse result in my experiments so I believe that is too small or is harmed by a domain shift, but not sure what should be.

Thank you!

ckLibra commented 9 months ago

Sure, the specific configuration will be given by @JUGGHM since I am not able to access to the codes currently. BTW, maybe you can also check wheather the intrinsic is aligned to the resolution, and visualize the optical flow estimated in DROID to better check the bug. For a quick check, I think the resolution of (288, 960) could be fine.

KyanainsGate commented 9 months ago

Thank you for your time and great feedback, @ckLibra!

I experimented with resolution [288, 960] just now. That says resolution gives no large difference in metrics, so we may have a 'better' configuration to work this on KITTI.

Anyway, I'd love to wait for @JUGGHM's response for it. Thank you very much for all great support.

P.S. Let me share the experimental result: I'm 90% sure that no problem with the intrinsics alignment and t_rel calculation, so started to think like the above. image

Result resolution t_rel r_rel fx,fy,cx,cy
Paper (Droid. w/o Metric3D) ? 21.7 0.23 ?
Replicate A: Just RGB [288,960] 79.0 32.3 [553.7, 550.4, 471.3, 142.5]
Replicate B: Just RGB [192,640] 77.1 31.4 [369.1, 366.9, 314.2, 95.0]
JUGGHM commented 9 months ago

Thank you for your time and great feedback, @ckLibra!

I experimented with resolution [288, 960] just now. That says resolution gives no large difference in metrics, so we may have a 'better' configuration to work this on KITTI.

Anyway, I'd love to wait for @JUGGHM's response for it. Thank you very much for all great support.

P.S. Let me share the experimental result: I'm 90% sure that no problem with the intrinsics alignment and t_rel calculation, so started to think like the above. image

Result resolution t_rel r_rel fx,fy,cx,cy Paper (Droid. w/o Metric3D) ? 21.7 0.23 ? Replicate A: Just RGB [288,960] 79.0 32.3 [553.7, 550.4, 471.3, 142.5] Replicate B: Just RGB [192,640] 77.1 31.4 [369.1, 366.9, 314.2, 95.0]

I checked the code of resize, it is: h1 = sqrt(384* 512 * h0 / w0) w1 = sqrt(384* 512 * w0 / h0) where h0 and w0 are the original size.

So the size should be something close to (240, 824)

ckLibra commented 9 months ago

Thank you for your time and great feedback, @ckLibra!

I experimented with resolution [288, 960] just now. That says resolution gives no large difference in metrics, so we may have a 'better' configuration to work this on KITTI.

Anyway, I'd love to wait for @JUGGHM's response for it. Thank you very much for all great support.

P.S. Let me share the experimental result: I'm 90% sure that no problem with the intrinsics alignment and t_rel calculation, so started to think like the above. image

Result resolution t_rel r_rel fx,fy,cx,cy Paper (Droid. w/o Metric3D) ? 21.7 0.23 ? Replicate A: Just RGB [288,960] 79.0 32.3 [553.7, 550.4, 471.3, 142.5] Replicate B: Just RGB [192,640] 77.1 31.4 [369.1, 366.9, 314.2, 95.0]

I don't remember whether I registered the estimated trajectory to the gt before calculating the metric, but from your figure, it seems that this may be the reason for the poor r_rel.

KyanainsGate commented 9 months ago

Hello, sorry for the late response!

h1 = sqrt(384 512 h0 / w0) w1 = sqrt(384 512 w0 / h0)

Thank you @JUGGHM! That seems similar to the eth3d demo in DROID official implementations.

How about the metrics, such as scale alignment or each rel calculation? Though I've just tried two eval scripts by the following repositories, both generate a very similar result and never got close to (21.7, 0.23)

Thank you!

JUGGHM commented 9 months ago

Hello, sorry for the late response!

h1 = sqrt(384 512 h0 / w0) w1 = sqrt(384 512 w0 / h0)

Thank you @JUGGHM! That seems similar to the eth3d demo in DROID official implementations.

How about the metrics, such as scale alignment or each rel calculation? Though I've just tried two eval scripts by the following repositories, both generate a very similar result and never got close to (21.7, 0.23)

Thank you!

Any ideas on that @ckLibra ? How can I provide the information he might need according to our code space?

JUGGHM commented 9 months ago

Hello, sorry for the late response!

h1 = sqrt(384 512 h0 / w0) w1 = sqrt(384 512 w0 / h0)

Thank you @JUGGHM! That seems similar to the eth3d demo in DROID official implementations.

How about the metrics, such as scale alignment or each rel calculation? Though I've just tried two eval scripts by the following repositories, both generate a very similar result and never got close to (21.7, 0.23)

Thank you!

Currently I am not available to re-implement the droid-slam experiment myself. It was originally done by @ckLibra. After finishing Metric3D v2 I will closely examine this since many users propose issues on it.