Request Word Level Bounding Box Detection

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

https://paddlepaddle.github.io/PaddleOCR/

Apache License 2.0

44.2k stars 7.82k forks source link

Request Word Level Bounding Box Detection #9159

Closed rattling closed 1 year ago

rattling commented 1 year ago

Text detection is currently at a sentence level. Requesting requirement is to detect boxes at a word level. This helps to label text correctly to support using PaddleOCR in an annotation pipeline.

E.g. the detector (PPOCRSystemv3) returns the following bounding box: det boxes: [[1096,1652],[1995,1652],[1995,1695],[1096,1695]]rec text: Pension monies from previous period(s) have been remitted rec score:0.983965 .

We would like it to consistently return boxes for 'Pension', 'monies', 'from' etc.

LDOUBLEV commented 1 year ago

PaddleOCR currently does not support returning word-level detection boxes.

tianchiguaixia commented 1 year ago

虽然，不能给出每个字的box，但是我们可以利用代码推测一下大概位置。 def addcharinfo(ocr_result): ''' :param ocr_result: :return: charinfo ''' temp=ocr_result['ocr_result']['words_result'] for id, line in enumerate(temp): width = line['pos'][1]['x'] - line['pos'][0]['x'] height = line['pos'][3]['y'] - line['pos'][0]['y'] avg_width = width // len(line['words']) start_box_x = line['pos'][0]['x'] start_box_y = line['pos'][0]['y'] char_list = [] for index in range(len(line['words'])): char_list.append({'word': line['words'][index], 'box': [{"x": start_box_x, "y": start_box_y}, {"x": start_box_x + avg_width, "y": start_box_y}, {"x": start_box_x + avg_width, "y": start_box_y + height}, {"x": start_box_x, "y": start_box_y + height}]}) start_box_x = start_box_x + avg_width ocr_result['ocr_result']['words_result'][id]["charInfo"] = char_list return ocr_result

rattling commented 1 year ago

@tianchiguaixia - Thanks for that. I guess the post-processing approach you outline relies on 2 key assumptions 1. font size is consistent throughout box and 2. the text recognition is faithfully recording white space between words. We'll have a look at our use cases to see if this kind of approach works well.

@LDOUBLEV - Perhaps this query is a feature request rather than an issue. Is this the right way to raise it or should I raise it through another method?

danielvfung commented 1 year ago

This is a really important requirement to have word level bounding boxes, especially for NLP NER type labelling and inference tasks. This links to my discussion topic, [https://github.com/PaddlePaddle/PaddleOCR/discussions/9281], the DB++ and SAST models seem to be able to provide better than PP-OCRv3 word level bounding boxes, but these models are 100MB type models compared to < 4MB footprint of the PP-OCRv3 model for bulk inference tasks.

tianchiguaixia commented 1 year ago

I agree with you very much. Only the word position can make NER in NLP. This is my NER extraction.

5d32fe57612e1244e267742a19e00c0

LDOUBLEV commented 1 year ago

@danielvfung you can try the detection model for abc and numbers. https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar

fungdanielv commented 1 year ago

I tested en_PP-OCRv3_det_infer.tar, it behaves the same as the default PP-OCRv3 with no det_model _dir specified

rattling commented 1 year ago

Does it seem like word level text detection in PP-OCRv3 or a similar size model is viable? If so, how to raise a feature request?

NaumanHSA commented 1 year ago

@rattling Please check #9485 if this can help you. I was stuck with the same issue, doing some workaround almost solved the problem for me.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.