NVlabs / GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
https://arxiv.org/abs/2202.11094
Other
705 stars 53 forks source link

Non parametric grouping #33

Open roysubhankar opened 2 years ago

roysubhankar commented 2 years ago

Hi,

Can you please provide more details as how you do the non-parametric grouping on CLIP's features (obtained from ViT encoder)?

xvjiarui commented 2 years ago

Hi @roysubhankar

The steps are as followed:

  1. Extract feature map from CLIP image encoder, reshape it to [C, H/16, W/16].
  2. Perform K-Means(or other clustering algo.) on the feature map with k=8
  3. Assign each pixel to the clustering center according to the distance.