Open becdridan opened 2 years ago
Describe the bug The bounding boxes returned by, for example, ocr_agent.gather_data(res, agg_level=lp.TesseractFeatureType.BLOCK) don't reflect the block size in the initial data. Looking at the code, I think by removing elements where text is NaN (https://github.com/Layout-Parser/layout-parser/blob/0809fa89fef08e34a4c73d5c1285e93ba80dc309/src/layoutparser/ocr/tesseract_agent.py#L146), it removes all levels except WORD, and so the block is only as wide as the longest word.
ocr_agent.gather_data(res, agg_level=lp.TesseractFeatureType.BLOCK)
Checklist
To Reproduce Steps to reproduce the behavior:
ocr_agent = lp.TesseractAgent() res = ocr_agent.detect(image, return_response=True) layout = ocr_agent.gather_data(res, agg_level=lp.TesseractFeatureType.BLOCK)
If you then look at any element more than one word wide, you can see the block is not as wide as would be indicated by the original data in
res["data"].loc[res["data"].level == lp.TesseractFeatureType.BLOCK+1]
Environment
Additional context Add any other context about the problem here.
Thanks for brining this up. Totally agree and yes, I planned to work on this in #81 !
Describe the bug The bounding boxes returned by, for example,
ocr_agent.gather_data(res, agg_level=lp.TesseractFeatureType.BLOCK)
don't reflect the block size in the initial data. Looking at the code, I think by removing elements where text is NaN (https://github.com/Layout-Parser/layout-parser/blob/0809fa89fef08e34a4c73d5c1285e93ba80dc309/src/layoutparser/ocr/tesseract_agent.py#L146), it removes all levels except WORD, and so the block is only as wide as the longest word.Checklist
To Reproduce Steps to reproduce the behavior:
If you then look at any element more than one word wide, you can see the block is not as wide as would be indicated by the original data in
Environment
Additional context Add any other context about the problem here.