Closed rattling closed 1 year ago
PaddleOCR currently does not support returning word-level detection boxes.
虽然,不能给出每个字的box,但是我们可以利用代码推测一下大概位置。 def addcharinfo(ocr_result): ''' :param ocr_result: :return: charinfo ''' temp=ocr_result['ocr_result']['words_result'] for id, line in enumerate(temp): width = line['pos'][1]['x'] - line['pos'][0]['x'] height = line['pos'][3]['y'] - line['pos'][0]['y'] avg_width = width // len(line['words']) start_box_x = line['pos'][0]['x'] start_box_y = line['pos'][0]['y'] char_list = [] for index in range(len(line['words'])): char_list.append({'word': line['words'][index], 'box': [{"x": start_box_x, "y": start_box_y}, {"x": start_box_x + avg_width, "y": start_box_y}, {"x": start_box_x + avg_width, "y": start_box_y + height}, {"x": start_box_x, "y": start_box_y + height}]}) start_box_x = start_box_x + avg_width ocr_result['ocr_result']['words_result'][id]["charInfo"] = char_list return ocr_result
@tianchiguaixia - Thanks for that. I guess the post-processing approach you outline relies on 2 key assumptions 1. font size is consistent throughout box and 2. the text recognition is faithfully recording white space between words. We'll have a look at our use cases to see if this kind of approach works well.
@LDOUBLEV - Perhaps this query is a feature request rather than an issue. Is this the right way to raise it or should I raise it through another method?
This is a really important requirement to have word level bounding boxes, especially for NLP NER type labelling and inference tasks. This links to my discussion topic, [https://github.com/PaddlePaddle/PaddleOCR/discussions/9281], the DB++ and SAST models seem to be able to provide better than PP-OCRv3 word level bounding boxes, but these models are 100MB type models compared to < 4MB footprint of the PP-OCRv3 model for bulk inference tasks.
I agree with you very much. Only the word position can make NER in NLP. This is my NER extraction.
@danielvfung you can try the detection model for abc and numbers. https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar
I tested en_PP-OCRv3_det_infer.tar, it behaves the same as the default PP-OCRv3 with no det_model _dir specified
Does it seem like word level text detection in PP-OCRv3 or a similar size model is viable? If so, how to raise a feature request?
@rattling Please check #9485 if this can help you. I was stuck with the same issue, doing some workaround almost solved the problem for me.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Text detection is currently at a sentence level. Requesting requirement is to detect boxes at a word level. This helps to label text correctly to support using PaddleOCR in an annotation pipeline.
E.g. the detector (PPOCRSystemv3) returns the following bounding box: det boxes: [[1096,1652],[1995,1652],[1995,1695],[1096,1695]]rec text: Pension monies from previous period(s) have been remitted rec score:0.983965 .
We would like it to consistently return boxes for 'Pension', 'monies', 'from' etc.