IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.74k stars 684 forks source link

GroundingDINO Inference speed #132

Open lsn199603 opened 1 year ago

lsn199603 commented 1 year ago

GroundingDINO Inference result is very good. However, the inference speed is 5FPS,Is it possible to improve the inference speed by pre-encoded text ? Looking forward to your reply!

SlongLiu commented 1 year ago

It is a good point. I believe we can improve the throughout by technique optimizations. It would be helpful if you'd like to provide PRs.

kenhuang1964 commented 1 year ago

Hey @lsn199603 , does GroundingDINO work on live video captures?

lsn199603 commented 1 year ago

Hey @lsn199603 , does GroundingDINO work on live video captures?

Hello, I only tested mp4 file video, not rstp video stream

kenhuang1964 commented 1 year ago

Hey @lsn199603 , does GroundingDINO work on live video captures?

Hello, I only tested mp4 file video, not rstp video stream

Awesome thanks! Is the implementation for mp4 file video similar to YOLO video object detection implementation?

lsn199603 commented 1 year ago

thanks

Yes, the prompt needs to be configured in advance

kenhuang1964 commented 1 year ago

thanks

Yes, the prompt needs to be configured in advance

Thank you!

Nancis1130 commented 1 year ago

Have you made any progress on pre-encoding?

farukcankaya commented 2 weeks ago

Hey @lsn199603, if you don’t mind, could you share the specifications you used to achieve 5 FPS? Specifically:

In my test, with an input image of 1200x1800, DINO detects 5 objects, and the prompt includes 13 categories (e.g., "xxx., yyy., zzz.,...") totaling 133 characters.