Open leoyala opened 2 weeks ago
Hi. it is the correct one. If you want to look for the depth model only, you can just check the code starts from here. In the class I mentioned above, I mainly first define a modified ViT-base, then add DV-LoRA to the model, then I define the depth decoder heads.
The detailed definition of ViT-base is in models/backbones
. The detailed definition of the decoder head is in here.
When I try to load the depth_model.pth
that I downloaded from the link in th README, I get an error indicating that there are keys missing or that there are unexpected keys.
I, so far, tried the endodac
and the DPTHead
classes:
model = endodac().load_state_dict("depth_model.pth")
model = DPTHead(in_channels=3).load_state_dict("depth_model.pth")
I also tried loading the depth_anything_vitb14.pth
using the endodac
class, but the same error happened. I am not sure if I am using the wrong class to load just the depth_model.pth
.
Dear @BeileiCui,
I think I managed to load the model using the code below. But I am facing the issue that the depth estimations seem strange based on the sample image I am attaching here. Is there a step that I might be missing?
Sample code:
import torch
import skimage
x = io.imread("0001_color.png")
depth_dict = torch.load("depth_model.pth")
model = endodac.endodac()
model.load_state_dict(torch.load("depth_anything_vitb14.pth"), strict=False)
model_dict = model.state_dict()
model.load_state_dict({k: v for k, v in depth_dict.items() if k in model_dict})
model.eval()
with torch.no_grad():
input = torch.tensor(x, dtype=torch.float).permute(2, 0, 1).unsqueeze(0)
disp = model.forward(input)
output_disp = disp[("disp", 0)]
pred_disp, depth = disp_to_depth(output_disp, min_depth=0.1, max_depth=150)
depth_np = depth.squeeze().cpu().numpy()
Depth output:
Sample image:
Hi, @leoyala sorry for my late response, I have been busy this week.
First of all, This link is the checkpoint we fine-tuned on SCARED, so if you wants to directly make some evaluation, you should download this one. The depth_anything_vitb14.pth
one is a pretrained weight from Depth-Anything where we start our fine-tuning, so if you want to fine-tine the model on some other dataset, you should download this one.
The issue you mentioned should because the EndoDAC is not properly defined so some parts are missing. I suggest you see the code in './evaluate_depth.py
first to see how EndoDAC is loaded first.
The default setting in the ./models/endodac
file is not what we proposed. we have proposed the default setting in ./options.py
.
For example, a proper parameter setting for endodac is
depther = endodac.endodac(
backbone_size = "base", r=4, lora_type="dv_lora",
image_shape=(224,280), pretrained_path="./pretrained_model",
residual_block_indexes=[2,5,8,11],
include_cls_token=True)
Then you may load the checkpoint and evaluate it.
Thank you for the clarification @BeileiCui
I managed to evaluate my sample image, the problem was emanating more from the data format. I just had to normalize the image between 0 and 1 before passing it to the model.
I am attaching the result in case someone is interested in the future to try this out.
Hi @BeileiCui,
I am looking into your code, and I would like to use the depth model only. Could you point me to where I can find the module definition for that
depth_model.pth
? I have looked into the classendodac.endodac
, but I think that this is the wrong one, isn't it?