AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.34k stars 418 forks source link

Deployment Options / Instructions #39

Open aljaroudi opened 7 months ago

aljaroudi commented 7 months ago

Very impressive work! It would be great if this could be published to HuggingFace as a model as well, allowing easy API deployments. HuggingFace doesn't allow deploying the current one. or at least some instructions on how to deploy the ONNX/PyTorch pre-trained model in a minimal setup.

aljaroudi commented 6 months ago

The current Replicate model almost solves this, but it seems to only output images. That limits the potential of this model. If the output were just the raw prediction data, it'd be useful to everyone. It'd also make it more flexible, save processing time (API cost), and save bandwidth of the image being re-downloaded.

Simple JSON output of class names (multi-label classification), bounding boxes (object detection), or polygons (segmentation) would make this a lot more useful.

wondervictor commented 6 months ago

Hi @aljaroudi, we believe the deployment is a crucial part of YOLO-World and we will provide the instructions and full code soon. I apologise for not getting back to you sooner after a loooong vacation. Thanks for your suggestion!

aljaroudi commented 4 months ago

Are there any updates on this? Are there plans to support instant Serverless deployment with platforms like Replicate? The current Replicate model is great, but it only returns an annotated image. That's only useful for demos, not for serious software applications.

wondervictor commented 4 months ago

Hi @aljaroudi, now we have provided several ways for deployment/inference and you can modify the inference code as you want. It may be helpful: