NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
8.45k stars 1.32k forks source link

DINOv2 - Segmentation #393

Open VINIT777 opened 4 months ago

VINIT777 commented 4 months ago

If i have image embeddings extracted using dinov2 base - then how can use them for image segmentation.

I refered to your dinov2 notebooks - but it was more about finetuning linear head for segmentation.

How to use dinov2 out of the box for segmentation like we use SAM?

NielsRogge commented 4 months ago

Hi,

You cannot use DINOv2 out-of-the-box for segmentation as in SAM. You need to explicitly train DINOv2 + some segmentation head (like a linear layer, or a more complex one like a Mask2Former decoder) to predict a segmentation map.

The authors did release some DINOv2 + linear layer and DINOv2 + Mask2Former decoder checkpoints as shown here: https://github.com/facebookresearch/dinov2/blob/main/notebooks/semantic_segmentation.ipynb.