-
Hello. Could you please advise me on how to properly train a model for 3D VG on ScanRefer: model, losses, dataset, metrics?
Your current model can predict bounding boxes only as text and only with …
-
Thank you for your outstanding work, but I still met many problems in the process of reproducing the pre-training results.
I use the following command to pre-train the groundingdino_swint:
bash …
-
I'm using 6GPUs on a single machine. This is my command:
```shell
python -m lamorel_launcher.launch --config-path Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/configs --config-name lo…
-
### Self Checks
- [X] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones.
- [X] I confirm that I am using English to su…
-
Thanks for the awesome Grounding-DINO, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
* OV-DINO is a novel unified open vocabulary detecti…
-
### Model description
Combines best practices of CLIP and object detectors.
Allows for localization and grounding of text and image content.
### Open source status
- [X] The model implementation i…
-
Aiming to link natural language descriptions to specific regions in a 3D scene represented as 3D point clouds, 3D visual grounding is a very fundamental task for human-robot interaction. The recogniti…
-
Hello!
Thanks for the great re-implementation of GroundingDino. I am trying to understand you code.
In the [usage.md](https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_din…
-
apologize for the questions about your another significant work .
really appreciate your work AffordanceLLM: Grounding Affordance from Vision Language Models and this 3DOI about the breaking contrib…
-
你好,你们的工作自称是“zero-shot”可是却需要训练,跟 ReCLIP 的 setting 完全不一致啊,这该怎么解释?难道审稿的时候没有审稿人质疑?论文当中对训练的方式和数据也没有解释清楚,还故意放补充材料。
your work is labeled as "zero-shot," but it requires training, which contradicts the ReC…