-
May I ask whether mmdetection's GLIP will implement the "phrase grounding" function in their paper, the function is very attractive, it can automatically label a large amount of image data for object …
-
## 🚀 Feature
Currently, the project uses `GroundingDINO` as the visual grounding model which is the best performing model for some benchmark datasets
![current benchmarks for zero-shot object dete…
-
For custom video input, does the supported text prompt have to be a word (i.e. a word representing a certain category), can it be a sentence?
-
Following the clearly-written README, I implemented the model successfully.
However, for my cases, I found some problems.
I used the code `grounded_sam2_local_demo.py` and the prompt is `"car . bik…
-
Hello!
The detect objects:
```
detections = grounding_dino_model.predict_with_classes(
image=image,
classes=enhance_class_name(class_names=CLASSES),
box_threshold=BOX_TRESHOLD,
…
-
lyf6 updated
2 months ago
-
Hi,
Thank you so much for developing this impactful and impressive work! This work really bridges the gap in multimodal grounding capability to the visual world.
I would like to kindly ask if y…
-
If I use "caption_to_phrase_grounding" using multiple inputs on the text prompt eg. bike, red car, it highlights the segments in the image and I can get the separate output_mask_select to work. Some…
-
**Describe**
Model I am using: Kosmos-2
Hi! I am working on fine-tuning the Kosmos-2 model for my own application. In short, the target may appear multiple times in the image (e.g., cars in a park…
-
Thanks for the great work!
When I try the grounding [demo file](https://github.com/IDEA-Research/GroundingDINO/blob/main/demo/inference_on_a_image.py), I find a sentence (eg. a man in blue coat) wi…