Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.64k stars 449 forks source link

Any idea about Detectron gets overlapping and sometimes misses some blocks #167

Closed rrrokhtar closed 1 year ago

rrrokhtar commented 1 year ago

The problem I am currently using layout-parser to detect the blocks of a scanned book papers and trying to take each block separately from the page and do some processing over them.

Checklist

To Reproduce

import layoutparser as lp
import cv2

image = cv2.imread("/content/image_0.jpg")
# Convert the image from BGR (cv2 default loading style) to RGB
image = image[..., ::-1]

model = lp.Detectron2LayoutModel((lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config),
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                 label_map={1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"})

# Detect the layout of the input image
layout = model.detect(image)

# Show the detected layout of the input image
lp.draw_box(image, layout, box_width=3)

Environment

  1. Platform [Linux] (on colab)
  2. Installation commands
    !sudo apt-get update
    !sudo apt-get install libleptonica-dev tesseract-ocr libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn
    !pip install layoutparser   
    !pip install layoutparser torchvision && pip install "git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2"   
    !pip install "layoutparser[ocr]"    
    !pip install "layoutparser[layoutmodels]" # Install DL layout model toolkit 

    Screenshots

1- Overlapping 3 image_3
2- Missing 7 image_7

I know it may not the right place to release that issue, but I think you may have an idea about that problem

prasum commented 1 year ago

@rrrokhtar the pretrained model is trained on scientific and academic research papers. For the above corpus for scanned book paper, https://github.com/Layout-Parser/layout-model-training can be used for custom training