Try OV-DINO, a more powerful open-vocabulary detector.

Thanks for the awesome Grounding-DINO, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.

OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.
OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.
OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +2.5% AP on COCO and +12.7% AP on LVIS compared to Grounding-DINO in zero-shot evaluation.

We have released the evaluation, fine-tuning, demo code in our project, feel free to try our model for your application.

Welcome everyone to try our model and feel free to raise issue if you encounter any problem.

IDEA-Research / GroundingDINO