Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.82k stars 464 forks source link

Support saving of layouts to open-standard hOCR file format. #188

Open Blue-PCB opened 1 year ago

Blue-PCB commented 1 year ago

Motivation I hope layout-parser can support the open standard HTML OCR (hOCR) file format that represents document layouts. It would allow easier creation of OCR'ed PDFs and allow for interoperability with other tools.

Related resources hOCR Specification v1.2

Additional context Ocropus hOCR-Tools supports the hOCR format, but hasn't been updated in a while.