BeileiCui / SurgicalDINO

[IPCAI'2024 (IJCARS special issue)] Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
47 stars 2 forks source link

Difference between SurgicalDINO and EndoDAC #5

Closed leoyala closed 1 month ago

leoyala commented 3 months ago

Hello,

I was wondering what are the main differences between this project and your EndoDAC model. I see that both aim to estimate depth from surgical images, but I am not sure which advantages/disadvantages each has. I would appreciate it if you could clarify that.

BeileiCui commented 3 months ago

Of course

1. SurgicalDINO was meant to be a supervised method(although we also test the model with SSL method) that requires depth ground truth to be fine-tuned. EndoDAC is meant to be a SSL method where we estimate depth, ego-motion and intrinsic at the same time. Therefore Endodac technically only requires surgical frames to be trained.

  1. Surgical-DINO only utilizes vanilla LoRA to fine-tune the model. We design DV-LoRA for EndoDAC to fine-tune with fewer Lora parameters when training. We also added residual necks to solve for the neglection of high-frequency information. We also utilize a DPT-liked multi-head depth head for EndoDAC while SurgicalDINO only has a simple linear layer for depth head.

I would say EndoDAC is more focused on an overall Mocular depth estimation SSL method. EndoDAC has more generalization ability because you only need image frames to fine-tune it. EndoDAC shows much better qualitative results and quantitative results.

leoyala commented 3 months ago

Thank you for the detailed description @BeileiCui! 👍🏼