QuiVer Benchmarks is a tool that helps you decide which OCR-D workflows are most suitable for your data. It executes preset workflows on different kinds of Ground Truth and evaluates the result. The results with the most recent version of ocrd_all can be viewed at https://ocr-d.de/quiver-frontend.
This repository holds everything needed to automatically execute different OCR-D workflows on images and evaluate the outcomes. It creates benchmarks for OCR-D data in a containerized environment. QuiVer Benchmarks currently runs in an automated workflow (CI/CD).
QuiVer Benchmarks is based on ocrd/all:maximum
and has all OCR-D processors at hand that a workflow might use.
To speed up QuiVer Benchmarks you can mount already downloaded text recognition models to /usr/local/share/ocrd-resources/
in docker-compose.yml
by adding
- path/to/your/models:/usr/local/share/ocrd-resources/
to the volumes
section.
Otherwise, the tool will download all ocrd-tesserocr-recognize
models as well as ocrd-calamari-recognize qurator-gt4histocr-1.0
on each run.
make build
make start
make prepare-default-gt
make run
data/workflows.json
on your host systemmake stop
to shut down and remove the Docker container you created previouslyThe relevant benchmarks gathered by QuiVer Benchmarks are defined in OCR-D's Quality Assurance specification and comprise
QuiVer Benchmarks currently uses the following Ground Truth:
A detailed list of images used for the Reichsanzeiger GT sets can be found in the data_src
directory.
Add new OCR-D workflows to the directory workflows/ocrd_workflows
according to the following conventions:
_ocr.txt
, evaluation workflows with _eval.txt
. The files will be converted by OtoN to Nextflow files after the container has started.ocrd process
You can then either rebuild the Docker image via docker compose build
or mount the directory to the container via
- ./workflows/ocrd_workflows:/app/workflows/ocrd_workflows
in the volumes
section and spin up a new run with docker compose up
.
Delete the respective TXT files from workflows/ocrd_workflows
and either rebuild the image or mount the directory as volume as described above.
See LICENSE