Open egeozguroglu opened 1 year ago
We will take a look into this problem recently
Thanks, please let me know! @rentainhe @SlongLiu This is especially problematic when prompting GroundingDINO + SAM with a set of vocabulary (305 nouns in our case). Many, many groups of nouns get tacked onto each other, e.g. "dough tofu potato peeler biscuit" become a group when we prompt the model for separate predictions with 305 nouns.
I have a same issue with multi-class objects.
Please inform if there is any updates regarding this in the future :)
Hello! :) Nice work! is there any update with this issue? I am currently having the same problem
Hi, I've been using GroundingDINO + SAM for our research, and would like to query for multiple object categories for my usecase. e.g. "jug . onion . chair . toaster . wire . counter . glass . oil . potato . package ." (as suggested on this repo).
Unfortunately, when multiple object classes are added to the prompt as suggested, the GroundingDINO predictions get made with some categories combined. I was able to replicate the same error with your Huggingface Spaces Demo. See below.
Detection Prompt: "jug . onion . chair . toaster . wire . counter . glass . oil . potato . package ."
Input image:
Prediction Output:
In this case, glass and oil were combined into "glass oil," which is not desired behavior.
Would you have any insights on a quick solution? I will ultimately want to detect 300 object classes with one prompt, so resolving this is essential.