BeileiCui / SurgicalDINO

[IPCAI'2024 (IJCARS special issue)] Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
31 stars 2 forks source link

Inquiry Regarding Single-Channel Image Processing #4

Closed CYXYZ closed 1 month ago

CYXYZ commented 1 month ago

Dear Beilei Cui,

I hope this email finds you well. I am reaching out to discuss an issue I've encountered while testing my code for image processing on single-channel images. It appears that the code fails to function properly when applied to such images.

I am writing to inquire whether you have considered implementing operations on single-channel images or have encountered similar challenges in your work. Your insights and guidance on this matter would be greatly appreciated.

If possible, could you provide any suggestions or advice on how to adapt image processing algorithms to work effectively with single-channel images? Additionally, I would be interested in learning about any resources or references that could help me address this issue.

Thank you for taking the time to read this message. I look forward to hearing from you soon and eagerly await your response.

Best regards, cyxyz

BeileiCui commented 1 month ago

Thanks for your interest.

The original DINOv2 is a foundation model trained with regular RGB inputs with 3 channels. The main step regarding the channels is in the patch embedding model, the projection layer takes (B,C,H,W) input to (B,N,D) outputs. Therefore, I can think of 3 quick ways to adapt it to single-channel inputs.

Below are two methods that can keep the pre-trained weights in DINOv2:

  1. directly expand the single channels to 3 channels, you may copy them or make the new channels zeros.
  2. you may add a new projection layer before the projection layer in patch embedding, which inputs (B,1,H,W) to (B,3,H,W) output.

Or you can replace the pretrained projection layer in patch embedding to a new single-channel layer. In this way, when training, you have to make sure this new layer is trainable.