facebookresearch / CutLER

Code release for "Cut and Learn for Unsupervised Object Detection and Instance Segmentation" and "VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation"
Other
913 stars 90 forks source link

backbone change #49

Closed Alexanderisgod closed 10 months ago

Alexanderisgod commented 10 months ago

when i replace the Vit backbone to resnet, maskcut result uncorrect mask, so the backbone can only use Transformer ways?

frank-xwang commented 10 months ago

Given that MaskCut requires the application of normalized cuts to the patch-wise affinity matrix to produce segmentation masks, utilizing a ViT-based backbone becomes imperative. Moreover, the emerging segmentation properties are observed in self-supervised vision transformers, as detailed in the DINO paper. Consequently, MaskCut needs to harness the capabilities of the self-supervised ViT model for effective unsupervised segmentation.