Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.67k stars 449 forks source link

bad result detected #114

Closed DamonsJ closed 2 years ago

DamonsJ commented 2 years ago

I got bad result using layout-parser here is the image I am used: 1

here is the code run in python :

image = cv2.imread("1.png")
# Convert the image from BGR (cv2 default loading style)
# to RGB
image = image[..., ::-1]
origin_image = image.copy()

model = lp.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config', 
                             extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                             label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
# Load the deep layout model from the layoutparser API 
# For all the supported model, please check the Model 
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html

layout = model.detect(image)
# print("layout : ", layout)
# Detect the layout of the input image
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
drawRectangleInImage(origin_image, text_blocks, (36,255,12))

titles_blocks = lp.Layout([b for b in layout if b.type=='Title'])
drawRectangleInImage(origin_image, titles_blocks, (76, 155, 175))

figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])
drawRectangleInImage(origin_image, figure_blocks, (122, 96, 216))

lists_blocks = lp.Layout([b for b in layout if b.type=='List'])
drawRectangleInImage(origin_image, lists_blocks, (176, 155, 175))

tables_blocks = lp.Layout([b for b in layout if b.type=='Table'])
drawRectangleInImage(origin_image, tables_blocks, (76, 255, 75))

cv2.imshow('image', origin_image)
cv2.waitKey()

here is the result:

截屏2022-01-18 11 45 06

by the way :

there is some warning generated :

/usr/local/lib/python3.9/site-packages/detectron2/structures/image_list.py:99: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). max_size = (max_size + (stride - 1)) // stride * stride /usr/local/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

lolipopshock commented 2 years ago

Thank you for reporting this -- it can be easily resolved by reconfiguring the models hyperparameters, and one example is: https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .

DamonsJ commented 2 years ago

Thank you for reporting this -- it can be easily resolved by reconfiguring the models hyperparameters, and one example is: https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .

Hi, thanks very much for replying I just want to recognize text, figure and table from published document. how should I adjust the parameters? when I use the extra config in :https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .

I can recognize text , figure , but math equation can not be recognized.

Thanks!

lolipopshock commented 2 years ago

There's a separate model https://github.com/Layout-Parser/platform/issues/20 which can be used for detecting equation regions. Also see the code here https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L150