Training with custom wide-angle camera images

YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."

https://jugghm.github.io/Metric3Dv2/

BSD 2-Clause "Simplified" License

1.33k stars 100 forks source link

Training with custom wide-angle camera images #119

Open kishore-greddy opened 3 months ago

kishore-greddy commented 3 months ago

Hi, Thanks for the amazing work. I have a couple of questions regarding finetuning on custom dataset. I have already checked the open and closed issues and I did not find the answers there, hence creating a new issue.

Have you tried including any wide-angle images during your training? (I could not get this information from your paper)
If I want to finetune on fisheye images, do you have any recommendations?, My custom dataset has images, Euclidean distance (as depth) and intrinsic camera parameters which are defined differently when compared with perspective projection model that you use. Intrinsic parameters are available like described here
Since the images from the camera that I want to train is a different camera model with different FoV and distortions, would your training pipeline still hold good ?

Thanks in advance :).

JUGGHM commented 3 months ago

Thank you for your interest. If you want to finetune a fisheye dataset (with fixed intrinsic), you can directly train it neglecting the canonical transformation modules, as they were originally designed for pinhole cameras. In this case, the scale_factor should be always set to 1.

Here are some personal suggestions: (1) If you prefer to keep the normal branch, the depth-normal consistency loss should be attached. When calculating pseudo-surface normal, the back projection part shall be modified. (2) Otherwise, you can drop the normal branch and modify the network accordingly. The pre-trained weights related surface normal shall be re-initialized.

In both cases, we recommend that (1) In the early steps, the majority of the backbone and DPT-decoder should be frozen, and only the depth regression parts need to be tuned. (2) After sufficient steps, i.e. when the network can provide reasonable results, the backbone and DPT can be freed for training.

kishore-greddy commented 3 months ago

@JUGGHM Thank you for your valuable insights. I have just another additional question to your comments. Does dropping the normal branch mean, not predicting normals and dropping the ConvGRU modules and the GRUSequenceLoss altogether? And instead using the initally regressed depth for loss calculation? In short, Joint depth and normal optimization need to be skipped right?

Please confirm if my understanding is correct. Thanks in advance :)

YvanYin commented 2 months ago

@kishore-greddy Yes.

ahinchoi commented 1 month ago

Hi! First, thank you for the inspiring and insightful research. I'm also looking to fine tune the model on fish-eye images but with different camera intrinsics. In this case, should I (with the canonical transformation module in place)

undistort the images
then use the intrinsics of the undistorted image for training? Thank you for your time and consideration in advance.