Thanks for the awesome Grounding-DINO, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.
OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.
OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +2.5% AP on COCO and +12.7% AP on LVIS compared to Grounding-DINO in zero-shot evaluation.
We have released the evaluation, fine-tuning, demo code in our project, feel free to try our model for your application.
Thanks for the awesome Grounding-DINO, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.
OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.
OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +2.5% AP on COCO and +12.7% AP on LVIS compared to Grounding-DINO in zero-shot evaluation.
We have released the evaluation, fine-tuning, demo code in our project, feel free to try our model for your application.
Project: https://wanghao9610.github.io/OV-DINO
Paper: https://arxiv.org/abs/2407.07844
Code: https://github.com/wanghao9610/OV-DINO
Demo: http://47.115.200.157:7860/
Welcome everyone to try our model and feel free to raise issue if you encounter any problem.