RobvanGastel / dinov2-finetune

Testing adaptation of the DINOv2 encoder for vision tasks with Low-Rank Adaptation (LoRA)
MIT License
82 stars 9 forks source link

Provide an inference example #5

Open eypros opened 3 days ago

eypros commented 3 days ago

I am interested in utilizing your work for a project. So, the actual use include training to a custom dataset and using the model to infer masks afterwards. I couldn't find any direct information regarding the inference information.

My question is am I supposed to follow the original Dinov2 approach for inference or should I deduce the approach using the evaluation code you provide (inside the train pipeline that is)?

Can you provide a minimal, functional example for inference?

RobvanGastel commented 2 days ago

Hi!

When you look for inference do you refer to using the fine-tuned decoders (+ LoRA) on Pascal VOC and ADE20k using the ViT-L DINOv2 weights? In explanation.ipynb in the root of the project I have some examples of how to use this combination. For example,

import torch
from dino_finetune import DINOV2EncoderLoRA

encoder = torch.hub.load(repo_or_dir="facebookresearch/dinov2", model=f"dinov2_vitl14_reg").cuda()
dino_lora = DINOV2EncoderLoRA(
    encoder=encoder,
    r=3, # These are the same settings used in training
    emb_dim=1024, # The large ViT embedding dim
    img_dim=(308, 308), # For ease of use rescaling to a valid patch dimension 
    n_classes=21, # Number of classes in pascal VOC
    use_lora=True,
).cuda()

dino_lora.load_parameters("output/base_voc_lora.pt")
dino_lora.eval()

logits = dino_lora(torch.randn(1, 3, 308, 308).cuda().float())
y_hat = torch.argmax(torch.sigmoid(logits), dim=1)
y_hat.shape # (1, 308, 308)