Text prompt - object detection and segmentation of all things

cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

https://cvat.ai

MIT License

12.2k stars 2.95k forks source link

Text prompt - object detection and segmentation of all things #7129

Open KTXKIKI opened 9 months ago

KTXKIKI commented 9 months ago

Actions before raising this issue

[X] I searched the existing issues and did not find anything similar.
[X] I read/searched the docs

Is your feature request related to a problem? Please describe.

pytorch.zip Perhaps you can input the desired text or language description in the front end of the CVAT for open object detection and segmentation https://github.com/IDEA-Research/GroundingDINO/tree/main https://github.com/autodistill/autodistill-grounded-sam

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Sayanjones commented 6 months ago

Hi @KTXKIKI, I am interested to work on this project. Can we discuss this further?

adkbbx commented 6 months ago

Hey @KTXKIKI , Do let me know how I can get started on this enhancement task.

KTXKIKI commented 6 months ago

嗨，我有兴趣从事这个项目。我们可以进一步讨论这个问题吗？

Hello, I apologize for not being able to reply in a timely manner due to different time differences in China

Okay, I think we should start with Severless and modify some code in both the front-end and back-end to achieve real-time text input, prompt categories, and automatically annotate everything

KTXKIKI commented 6 months ago

嘿，请告诉我如何开始此增强任务。

Hello, I apologize for not being able to reply in a timely manner due to different time differences in China

I think we should start with Severless and modify some code in both the front-end and back-end to achieve real-time text input, prompt categories, and automatically annotate everything

Encapsulating the inference code, inference environment, and model into a Docker image and running them as containers to communicate with the CVAT server for automatic annotation. In fact, I have written some serverless functions above, but recently they have been put on hold and have not been further written

KTXKIKI commented 6 months ago

Some reference points: https://github.com/AILab-CVC/YOLO-World https://docs.autodistill.com/ https://github.com/IDEA-Research/GroundingDINO/tree/main https://github.com/autodistill/autodistill-grounded-sam https://github.com/mbzuai-oryx/groundingLMM https://github.com/hardikdava/cvat_plugins/tree/main

adkbbx commented 6 months ago

@KTXKIKI Thank you for your prompt response and sharing the resources since I am a new contributor to CVAT I am currently trying to set up the development environment locally on my Windows 11 PC using this documentation link. I will look into your resources as soon as I setup the environment on my machine. Do let me know if you have any suggestions or extra resources I can use for setting up the development environment locally to start contributing to this issue.

Sayanjones commented 6 months ago

@KTXKIKI Thank you for responding. I'm a new contributor to CVAT as well. Thank you for sharing the resources. I'll go through them once I have my Windows system up and running smoothly. Let's work together on this contribution and share any other advice or vent about the project. I'm excited to collaborate on CVAT with you!

ak4721269 commented 6 months ago

Hey @KTXKIKI, I am a new contributor to CVAT. Initially , I had used CVAT.ai to manually label plastics for this project . I would like to contribute to this project. Currently, I am setting up the environment on my Windows system .After that, I will refer the links mentioned above in order to get started with the project.

arch-adi21 commented 6 months ago

Hello @KTXKIKI i am interested to be the part of this journey . Basically I have domain expertise in machine learning and right now i am shifting to Deep learning where I find data augmentation to be a very interesting part. Treat me as beginner to suggest me some initial tasks or learning resources , which i should go through to start this journey.

kmh03214 commented 3 months ago

I strongly agree with the necessity of this project. I am currently developing a CVAT serverless model. Recently, I think it is necessary to have an interface in CVAT's Auto annotation that can receive text prompts to address the open vocabulary problem.

I hope it gets completed quickly and successfully! Thank you. 😃