Add OCR Support for Text/Business Card/Number Plate Annotation

cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

https://cvat.ai

MIT License

12.33k stars 2.97k forks source link

Add OCR Support for Text/Business Card/Number Plate Annotation #7628

Open 562888698 opened 6 months ago

562888698 commented 6 months ago

Hello Team,

I am an active user of CVAT and have been enjoying the platform's capabilities in annotating images and videos for various computer vision tasks. I recently came across a use case where it would be incredibly beneficial if CVAT could support Optical Character Recognition (OCR) annotations, specifically for scenarios such as recognizing and marking text within images, particularly for number plates on vehicles.

If there is already a way to achieve this through existing plugins or configurations, please let me know. Otherwise, I'd appreciate it if the team could consider adding this functionality in future releases.

zhiltsov-max commented 6 months ago

Hi, thank you for coming to us! Could you please explain what do you mean by OCR annotations specifically? Can such annotations be achieved with masks, polygons, or bounding boxes? Can labels (per character?) or label attributes (such as arbitrary text) be helpful in this scenario?

bsekachev commented 6 months ago

I suppose we may start from a good model, returning rectangles and text attribute.

Manittecool1213 commented 6 months ago

Hi, thank you for coming to us! Could you please explain what do you mean by OCR annotations specifically? Can such annotations be achieved with masks, polygons, or bounding boxes? Can labels (per character?) or label attributes (such as arbitrary text) be helpful in this scenario?

Per my understanding, I think what @562888698 is suggesting is performing OCR within an annotation bounding box / polygon. After a broad region containing text is selected through pre-existing annotation tools, OCR can be performed within said region to either extract the textual content as plaintext, or to automatically generate annotations around textual characters.

gw00295652 commented 6 months ago

Hi, thank you for coming to us! Could you please explain what do you mean by OCR annotations specifically? Can such annotations be achieved with masks, polygons, or bounding boxes? Can labels (per character?) or label attributes (such as arbitrary text) be helpful in this scenario?

Hello, I also have this same need. I have successfully implemented the full OCR system and have managed to output my results within a Docker container. Just like the follow screen shot: Screenshot from 2024-03-27 10-30-57 And in my annotation task, I set a contribute "content", which type is "text", what can I do to write my recgnoize result "415" in the table. This is my returned result

KTXKIKI commented 6 months ago

我想我们可以从一个好的模型开始，返回矩形和文本属性。

hi @bsekachev may be like thins #7130 look my update .zip i think that model can sovle ocr detection and transcripition And it supports rectangle or polygon detection box return and text transcription in multiple languages

KTXKIKI commented 3 months ago

Hi, any progress? Has the method I provided been helpful?

KTXKIKI commented 3 months ago

Paddleocr is a universal ocr model that supports both Chinese and English as well as various special symbols I found some dockers to create image files for it： https://github.com/PaddlePaddle/PaddleOCR/tree/main/deploy/docker I think the development of the main.py file should become easier

KTXKIKI commented 3 months ago

I think we can learn from it ：https://github.com/cvat-ai/cvat/tree/develop/serverless/openvino/omz/intel/face-detection-0205/nuclio