BeileiCui / EndoDAC

[MICCAI'2024] EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera
15 stars 1 forks source link

endodac depth model #5

Open leoyala opened 2 weeks ago

leoyala commented 2 weeks ago

Hi @BeileiCui,

I am looking into your code, and I would like to use the depth model only. Could you point me to where I can find the module definition for that depth_model.pth? I have looked into the class endodac.endodac, but I think that this is the wrong one, isn't it?

BeileiCui commented 2 weeks ago

Hi. it is the correct one. If you want to look for the depth model only, you can just check the code starts from here. In the class I mentioned above, I mainly first define a modified ViT-base, then add DV-LoRA to the model, then I define the depth decoder heads.

The detailed definition of ViT-base is in models/backbones. The detailed definition of the decoder head is in here.

leoyala commented 2 weeks ago

When I try to load the depth_model.pth that I downloaded from the link in th README, I get an error indicating that there are keys missing or that there are unexpected keys.

I, so far, tried the endodac and the DPTHead classes:

model = endodac().load_state_dict("depth_model.pth")
model = DPTHead(in_channels=3).load_state_dict("depth_model.pth")

I also tried loading the depth_anything_vitb14.pth using the endodac class, but the same error happened. I am not sure if I am using the wrong class to load just the depth_model.pth.

leoyala commented 1 week ago

Dear @BeileiCui,

I think I managed to load the model using the code below. But I am facing the issue that the depth estimations seem strange based on the sample image I am attaching here. Is there a step that I might be missing?

Sample code:

import torch
import skimage

x = io.imread("0001_color.png")

depth_dict = torch.load("depth_model.pth")

model = endodac.endodac()
model.load_state_dict(torch.load("depth_anything_vitb14.pth"), strict=False)
model_dict = model.state_dict()
model.load_state_dict({k: v for k, v in depth_dict.items() if k in model_dict})
model.eval()

with torch.no_grad():
    input = torch.tensor(x, dtype=torch.float).permute(2, 0, 1).unsqueeze(0)
    disp = model.forward(input)
output_disp = disp[("disp", 0)]
pred_disp, depth = disp_to_depth(output_disp, min_depth=0.1, max_depth=150)
depth_np  = depth.squeeze().cpu().numpy()

Depth output: depth

Sample image: 0001_color

BeileiCui commented 1 week ago

Hi, @leoyala sorry for my late response, I have been busy this week.

First of all, This link is the checkpoint we fine-tuned on SCARED, so if you wants to directly make some evaluation, you should download this one. The depth_anything_vitb14.pth one is a pretrained weight from Depth-Anything where we start our fine-tuning, so if you want to fine-tine the model on some other dataset, you should download this one.

The issue you mentioned should because the EndoDAC is not properly defined so some parts are missing. I suggest you see the code in './evaluate_depth.py first to see how EndoDAC is loaded first.

The default setting in the ./models/endodac file is not what we proposed. we have proposed the default setting in ./options.py.

For example, a proper parameter setting for endodac is

depther = endodac.endodac(
                backbone_size = "base", r=4, lora_type="dv_lora",
                image_shape=(224,280), pretrained_path="./pretrained_model",
                residual_block_indexes=[2,5,8,11],
                include_cls_token=True)

Then you may load the checkpoint and evaluate it.

leoyala commented 3 days ago

Thank you for the clarification @BeileiCui

I managed to evaluate my sample image, the problem was emanating more from the data format. I just had to normalize the image between 0 and 1 before passing it to the model.

I am attaching the result in case someone is interested in the future to try this out. ENDODAC