🔥 2024.4.28: Good news! The code and pre-trained model of DocScanner are now released!
🚀 Good news! The online demo for DocScanner is now live, allowing for easy image upload and correction.
🔥 Good news! Our new work DocTr++: Deep Unrestricted Document Image Rectification comes out, capable of rectifying various distorted document images in the wild.
🔥 Good news! A comprehensive list of Awesome Document Image Rectification methods is available.
This is a PyTorch/GPU re-implementation of the paper DocScanner: Robust Document Image Rectification with Progressive Learning.
Note:The model version used in the demo corresponds to "DocScanner-L" as described in the paper.
$ROOT/model_pretrained/
.$ROOT/distorted/
.$ROOT/rectified/
by default.
python inference.py
$ROOT/ssim_ld_eval.m
.$ROOT/ocr_img.txt
(Setting 1). Please refer to DewarpNet for the index of 25 documents (50 images) of DocUNet Benchmark used for their OCR evaluation (Setting 2). We provide the OCR evaluation code at $ROOT/OCR_eval.py
. The version of pytesseract is 0.3.8, and the version of Tesseract in Windows is recent 5.0.1.20220118. Note that in different operating systems, the calculated performance has slight differences.Method | MS-SSIM | LD | Li-D | ED (Setting 1) | CER | ED (Setting 2) | CER | Para. (M) |
---|---|---|---|---|---|---|---|---|
DocScanner-T | 0.5123 | 7.92 | 2.04 | 501.82 | 0.1823 | 809.46 | 0.2068 | 2.6 |
DocScanner-B | 0.5134 | 7.62 | 1.88 | 434.11 | 0.1652 | 671.48 | 0.1789 | 5.2 |
DocScanner-L | 0.5178 | 7.45 | 1.86 | 390.43 | 0.1486 | 632.34 | 0.1648 | 8.5 |
Please cite the related works in your publications if it helps your research:
@inproceedings{feng2021doctr,
title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={273--281},
year={2021}
}
@inproceedings{feng2022docgeonet,
title={Geometric Representation Learning for Document Image Rectification},
author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Wang, Yuechen and Li, Houqiang},
booktitle={Proceedings of the European Conference on Computer Vision},
year={2022}
}
@article{feng2021docscanner,
title={DocScanner: robust document image rectification with progressive learning},
author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
journal={arXiv preprint arXiv:2110.14968},
year={2021}
}
The codes are largely based on DocUNet and DewarpNet. Thanks for their wonderful works.
For commercial usage, please contact Professor Wengang Zhou (zhwg@ustc.edu.cn) and Hao Feng (haof@mail.ustc.edu.cn).