Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.67k stars 449 forks source link

TesseractAgent.gather_data() calculates bounding boxes incorrectly for levels other than WORD #117

Open becdridan opened 2 years ago

becdridan commented 2 years ago

Describe the bug The bounding boxes returned by, for example, ocr_agent.gather_data(res, agg_level=lp.TesseractFeatureType.BLOCK) don't reflect the block size in the initial data. Looking at the code, I think by removing elements where text is NaN (https://github.com/Layout-Parser/layout-parser/blob/0809fa89fef08e34a4c73d5c1285e93ba80dc309/src/layoutparser/ocr/tesseract_agent.py#L146), it removes all levels except WORD, and so the block is only as wide as the longest word.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version, see the Layout Parser Releases

To Reproduce Steps to reproduce the behavior:

  1. What command or script did you run?
    ocr_agent = lp.TesseractAgent()
    res = ocr_agent.detect(image, return_response=True)
    layout = ocr_agent.gather_data(res, agg_level=lp.TesseractFeatureType.BLOCK)

    If you then look at any element more than one word wide, you can see the block is not as wide as would be indicated by the original data in

    res["data"].loc[res["data"].level == lp.TesseractFeatureType.BLOCK+1]

Environment

  1. Please describe your Platform: Ubuntu Linux
  2. Please show the Layout Parser version: 0.3.2

Additional context Add any other context about the problem here.

lolipopshock commented 2 years ago

Thanks for brining this up. Totally agree and yes, I planned to work on this in #81 !