fh2019ustc / DocTr

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Other
345 stars 48 forks source link
document-image-processing document-image-rectification document-unwarping ocr pytorch-implementation

🚀 Exciting update! We have created a demo for our paper on Hugging Face Spaces, showcasing the capabilities of our DocTr. Check it out here!

🔥 Good news! Our new work DocTr++: Deep Unrestricted Document Image Rectification comes out, capable of rectifying various distorted document images in the wild.

🔥 Good news! Our new work exhibits state-of-the-art performances on the DocUNet Benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning with Repo.

🔥 Good news! A comprehensive list of Awesome Document Image Rectification methods is available.

DocTr

1 2 3

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

🚀 Demo (Link)

  1. Upload the distorted document image to be rectified in the left box.
  2. Click the "Submit" button.
  3. The rectified image will be displayed in the right box.
  4. Our demo environment is based on a CPU infrastructure, and due to image transmission over the network, some display latency may be experienced.

image

Training

DocTr consists of two main components: a geometric unwarping transformer (GeoTr) and an illumination correction transformer (IllTr).

Inference

  1. Download the pretrained models from Google Drive or Baidu Cloud, and put them to $ROOT/model_pretrained/.
  2. Put the distorted images in $ROOT/distorted/.
  3. Geometric unwarping. The rectified images are saved in $ROOT/geo_rec/ by default.
    python inference.py
  4. Geometric unwarping and illumination rectification. The rectified images are saved in $ROOT/ill_rec/ by default.
    python inference.py --ill_rec True

Evaluation

Method MS-SSIM LD ED (Setting 1) CER ED (Setting 2) CER
GeoTr 0.5105 7.76 464.83 0.1746 724.84 0.1832

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}
@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}
@article{feng2023doctrp,
  title={Deep Unrestricted Document Image Rectification},
  author={Feng, Hao and Liu, Shaokai and Deng, Jiajun and Zhou, Wengang and Li, Houqiang},
  journal={IEEE Transactions on Multimedia},
  year={2023}
}

Acknowledgement

The codes are largely based on DocUNet, DewarpNet, and DocProj. Thanks for their wonderful works.

Contact

For commercial usage, please contact Professor Wengang Zhou (zhwg@ustc.edu.cn) and Hao Feng (haof@mail.ustc.edu.cn).