Open kishore-greddy opened 3 months ago
Thank you for your interest. If you want to finetune a fisheye dataset (with fixed intrinsic), you can directly train it neglecting the canonical transformation modules, as they were originally designed for pinhole cameras. In this case, the scale_factor should be always set to 1.
Here are some personal suggestions: (1) If you prefer to keep the normal branch, the depth-normal consistency loss should be attached. When calculating pseudo-surface normal, the back projection part shall be modified. (2) Otherwise, you can drop the normal branch and modify the network accordingly. The pre-trained weights related surface normal shall be re-initialized.
In both cases, we recommend that (1) In the early steps, the majority of the backbone and DPT-decoder should be frozen, and only the depth regression parts need to be tuned. (2) After sufficient steps, i.e. when the network can provide reasonable results, the backbone and DPT can be freed for training.
@JUGGHM Thank you for your valuable insights. I have just another additional question to your comments. Does dropping the normal branch mean, not predicting normals and dropping the ConvGRU modules and the GRUSequenceLoss altogether? And instead using the initally regressed depth for loss calculation? In short, Joint depth and normal optimization need to be skipped right?
Please confirm if my understanding is correct. Thanks in advance :)
@kishore-greddy Yes.
Hi! First, thank you for the inspiring and insightful research. I'm also looking to fine tune the model on fish-eye images but with different camera intrinsics. In this case, should I (with the canonical transformation module in place)
Hi, Thanks for the amazing work. I have a couple of questions regarding finetuning on custom dataset. I have already checked the open and closed issues and I did not find the answers there, hence creating a new issue.
Thanks in advance :).