Curt-Park / segment-anything-with-clip

Segment Anything combined with CLIP
https://huggingface.co/spaces/curt-park/segment-anything-with-clip
Apache License 2.0
328 stars 23 forks source link
colab-notebook huggingface-spaces machine-learning nlp-machine-learning segmentation-model

Segment Anything with Clip

[HuggingFace Space] | [COLAB] | [Demo Video]

Meta released a new foundation model for segmentation tasks. It aims to resolve downstream segmentation tasks with prompt engineering, such as foreground/background points, bounding box, mask, and free-formed text. However, the text prompt is not released yet.

Alternatively, I took the following steps:

  1. Get all object proposals generated by SAM (Segment Anything Model).
  2. Crop the object regions by bounding boxes.
  3. Get cropped images' features and a query feature from CLIP.
  4. Calculate the similarity between image features and the query feature.
    # How to get the similarity.
    preprocessed_img = preprocess(crop).unsqueeze(0)
    tokens = clip.tokenize(texts)
    logits_per_image, _ = model(preprocessed_img, tokens)
    similarity = logits_per_image.softmax(-1)

How to run on local

Anaconda is required before start setup.

make env
conda activate segment-anything-with-clip
make setup
# this executes GRadio server.
make run

Open http://localhost:7860/

Successive Works

References