Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.75k stars 456 forks source link

layoutparser doens't work well for a very well-structured CV #103

Open ttbuffey opened 2 years ago

ttbuffey commented 2 years ago

Describe the bug layoutparser doens;t work well for a very well-structured CV, Am I using layoutparser in the wrong way? could you please help to check? Thanks very much.

To Reproduce

import layoutparser as lp
import cv2
import ssl
import warnings
ssl._create_default_https_context = ssl._create_unverified_context
warnings.filterwarnings('ignore')

image = cv2.imread("data/25.png")
image = image[..., ::-1]
model = lp.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config', 
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                 label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
layout = model.detect(image)
print(layout)
    # Detect the layout of the input image
lp.draw_box(image, layout, box_width=3).show()

Environment

  1. macos
  2. use below command to install layoutparser

Screenshots If applicable, add screenshots to help explain your problem.

Screen Shot 2021-12-02 at 3 51 32 PM Screen Shot 2021-12-02 at 3 51 40 PM Screen Shot 2021-12-02 at 3 42 58 PM
ruben-as-teixeira commented 2 years ago

I'm facing the same kind of difficulties. When applying to CVs, the results are very poor.

Bergrebell commented 2 years ago

have you tried working with different models? PrimaLayout for example gives me quite better results on a similar set of documents.

model = lp.Detectron2LayoutModel('lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config',
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                 label_map={1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"})

but they are still not perfect (that's) why i came here ;) - are there any options to tweak the text detection?