Layout blocks detected by models do not match blocks detected by Tesseract

Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

https://layout-parser.github.io/

Apache License 2.0

4.64k stars 449 forks source link

Layout blocks detected by models do not match blocks detected by Tesseract #210

Open maxycn opened 2 months ago

maxycn commented 2 months ago

Describe the bug Left: is the result from model = lp.EfficientDetLayoutModel("lp://PubLayNet/tf_efficientdet_d0/config") layout = model.detect(img) lp.draw_box(img, layout)

Right: is the result from pytesseract.image_to_data()

Clearly the left one is not right. Is there a way to fix it?