Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.78k stars 459 forks source link

Handling skewed text #31

Closed hallvagi closed 3 years ago

hallvagi commented 3 years ago

Hi, and thanks for the nice work!

I'm using the PrimaLayout model to detect layout in scanned documents. Most of the documents have been scanned at a slight angle, so the text is a bit skewed. The effectiveness of the model seems to vary a lot between images. When I test the model with rotated samples of a single document, it seems that only a single degree of rotation can impact the result a lot at a certain threshold. So I was curious if the PrimaLayout model was trained with image rotations as part of the augmentation pipeline? If not, could such augmentations make the model more robust to skewed text? Maybe the simplest hack in my current project is to deskew the images upfront?

lolipopshock commented 3 years ago

Thanks! To clarify:

  1. The PrimaLayout model is not trained with rotation augmentations.
  2. The easiest option for now might be de-skew the image, which might be necessary in most of the image processing pipelines. And to do that, you just need to:
    1. On the original image, you might want to find the quadrilateral box of the unskewed region
    2. Use the lp.Quadrilateral.crop_image API, which can automatically rectify the skewness - it uses a wrapped affine transformation.
hallvagi commented 3 years ago

Thanks! I tried https://github.com/sbrunner/deskew already, but will give the built in versions a go too. But I'm maybe considering training a model with various augmentations simulating scanning artefacts to see if that helps generalization (for scanned docs).