Open Shravan-Ganji opened 10 months ago
Have you tried correcting the scanned images to make the background plain white? Here's a robust looking example using opencv:
https://www.freedomvc.com/index.php/2022/01/17/basic-background-remover-with-opencv/
I have been trying to analyze the documents using layout parser on different types of documents, I am able to get expected results on True pdfs but not on scanned pdfs, it is detecting the scanned pdf image contents as figure or not as expected results.
I am facing this issue only for the scanned pdfs
Checklist
To Reproduce
import layoutparser as lp import cv2
image = cv2.imread("test.png") image = image[..., ::-1]
model = lp.models.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config', extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8], label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
color_map = { 'Text': 'red', 'Title': 'blue', 'List': 'green', 'Table': 'purple', 'Figure': 'pink', }
layout = model.detect(image)
lp.draw_box(image, layout, box_width=3,color_map=color_map)
Environment
Contains 2 images:
1: Scanned pdf image result 2: Proper pdf image result
![positive](https://github.com/Layout-Parser/layout-parser/assets/88659756/bc8655da-d478-4b67-be44-4864cd4f79ba)