OCR-D / quiver-benchmarks

Benchmarking OCR-D workflows in Docker
MIT License
2 stars 1 forks source link

QuiVer Benchmarks

QuiVer Benchmarks is a tool that helps you decide which OCR-D workflows are most suitable for your data. It executes preset workflows on different kinds of Ground Truth and evaluates the result. The results with the most recent version of ocrd_all can be viewed at https://ocr-d.de/quiver-frontend.

This repository holds everything needed to automatically execute different OCR-D workflows on images and evaluate the outcomes. It creates benchmarks for OCR-D data in a containerized environment. QuiVer Benchmarks currently runs in an automated workflow (CI/CD).

QuiVer Benchmarks is based on ocrd/all:maximum and has all OCR-D processors at hand that a workflow might use.

Requirements

To speed up QuiVer Benchmarks you can mount already downloaded text recognition models to /usr/local/share/ocrd-resources/ in docker-compose.yml by adding

- path/to/your/models:/usr/local/share/ocrd-resources/

to the volumes section. Otherwise, the tool will download all ocrd-tesserocr-recognize models as well as ocrd-calamari-recognize qurator-gt4histocr-1.0 on each run.

Usage (For Development)

Benchmarks Considered

The relevant benchmarks gathered by QuiVer Benchmarks are defined in OCR-D's Quality Assurance specification and comprise

Ground Truth Used

QuiVer Benchmarks currently uses the following Ground Truth:

A detailed list of images used for the Reichsanzeiger GT sets can be found in the data_src directory.

Adding New OCR-D Workflows (For Development)

Add new OCR-D workflows to the directory workflows/ocrd_workflows according to the following conventions:

You can then either rebuild the Docker image via docker compose build or mount the directory to the container via

- ./workflows/ocrd_workflows:/app/workflows/ocrd_workflows

in the volumes section and spin up a new run with docker compose up.

Removing OCR-D Workflows

Delete the respective TXT files from workflows/ocrd_workflows and either rebuild the image or mount the directory as volume as described above.

Outlook

License

See LICENSE