🚀 Exciting update! We have created a demo for our paper on Hugging Face Spaces, showcasing the capabilities of our DocTr. Check it out here!

🔥 Good news! Our new work DocTr++: Deep Unrestricted Document Image Rectification comes out, capable of rectifying various distorted document images in the wild.

🔥 Good news! Our new work exhibits state-of-the-art performances on the DocUNet Benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning with Repo.

🔥 Good news! A comprehensive list of Awesome Document Image Rectification methods is available.

DocGeoNet

Geometric Representation Learning for Document Image Rectification
ECCV 2022

Any questions or discussions are welcomed!

🚀 Demo (Link)

Upload the distorted document image to be rectified in the left box.
Click the "Submit" button.
The rectified image will be displayed in the right box.
Our demo environment is based on a CPU infrastructure, and due to image transmission over the network, some display latency may be experienced.

Training

We train the network using the Doc3D dataset.

Inference

Download the pretrained models from Google Drive, and put them to $ROOT/model_pretrained/.
Unwarp the distorted images in $ROOT/distorted/ and output the rectified images in $ROOT/rec/:
```
python inference.py
```

DIR300 Test Set

We release the DIR300 test set for evaluation the rectification algorithms.

Evaluation

Important. In the DocUNet Benchmark dataset, the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ingored by most of existing works. Before the evaluation, please make a check.
Use the rectified images available from Baidu Cloud for reproducing the quantitative performance on the DocUNet Benchmark reported in the paper and further comparison. We show the performance results of our method in the following table. For the performance of other methods, please refer to DocScanner and our paper.
Use the rectified images available from Google Drive to reproduce the quantitative performance on the DIR300 Test Set. For the performance of other methods, please refer to the paper.
Image Metrics: We use the same evaluation code for MS-SSIM and LD as DocUNet Benchmark dataset based on Matlab 2019a. Please compare the scores according to your Matlab version. We provide our Matlab interface file at $ROOT/ssim_ld_eval_DocUNet.m and $ROOT/ssim_ld_eval_DIR300.m for the DocUNet and DIR300 Benchmark, respectively.
OCR Metrics: The index of 30 document (60 images) of DocUNet Benchmark used for our OCR evaluation is $ROOT/ocr_img_DocUNet.txt (Setting 1, following DocTr). Please refer to DewarpNet for the index of 25 document (50 images) of DocUNet Benchmark used for their OCR evaluation (Setting 2). We provide the OCR evaluation code at $ROOT/OCR_eval_DocUNet.py and $ROOT/OCR_eval_DIR300.py for the DocUNet and DIR300 Benchmark, respectively. The version of pytesseract is 0.3.8, and the version of Tesseract in Windows is recent 5.0.1.20220118. Note that in different operating systems, the calculated performance has slight differences.

Benchmark Dataset	Method	MS-SSIM	LD	ED (Setting 1)	CER	ED (Setting 2)	CER
DocUNet	DocGeoNet	0.5040	7.71	379.00	0.1509	713.94	0.1821

Benchmark Dataset	Method	MS-SSIM	LD	ED	CER
DIR300	DocGeoNet	0.6380	6.40	664.96	0.2189

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2022docgeonet,
  title={Geometric Representation Learning for Document Image Rectification},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Wang, Yuechen and Li, Houqiang},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2022}
}

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}

@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

Acknowledgement

The codes are largely based on DocUNet and DewarpNet. Thanks for their wonderful works.

Contact

For commercial usage, please contact the email (haof@mail.ustc.edu.cn).

fh2019ustc / DocGeoNet

readme