IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
15.05k stars 1.39k forks source link

Text Prompts w/ Multiple Object Categories: Concepts incorrectly combined #217

Open egeozguroglu opened 1 year ago

egeozguroglu commented 1 year ago

Hi, I've been using GroundingDINO + SAM for our research, and would like to query for multiple object categories for my usecase. e.g. "jug . onion . chair . toaster . wire . counter . glass . oil . potato . package ." (as suggested on this repo).

Unfortunately, when multiple object classes are added to the prompt as suggested, the GroundingDINO predictions get made with some categories combined. I was able to replicate the same error with your Huggingface Spaces Demo. See below.

Detection Prompt: "jug . onion . chair . toaster . wire . counter . glass . oil . potato . package ."

Input image: image

Prediction Output: image

In this case, glass and oil were combined into "glass oil," which is not desired behavior.

Would you have any insights on a quick solution? I will ultimately want to detect 300 object classes with one prompt, so resolving this is essential.

rentainhe commented 1 year ago

We will take a look into this problem recently

egeozguroglu commented 1 year ago

Thanks, please let me know! @rentainhe @SlongLiu This is especially problematic when prompting GroundingDINO + SAM with a set of vocabulary (305 nouns in our case). Many, many groups of nouns get tacked onto each other, e.g. "dough tofu potato peeler biscuit" become a group when we prompt the model for separate predictions with 305 nouns.

mhyeonsoo commented 1 year ago

I have a same issue with multi-class objects.

Please inform if there is any updates regarding this in the future :)

mcosano commented 1 year ago

Hello! :) Nice work! is there any update with this issue? I am currently having the same problem