-
OCR-D processors are required to respect the `AlternativeImage` annotation of the METS/PAGE pair, cf. [spec (ff.)](https://ocr-d.github.io/page#alternativeimage-for-derived-images). That implies,
- o…
-
After installing `libcgal-dev` and its dependents, if I do
pip install -e scikit-geometry
then compilation fails like this:
```
Obtaining file:///data/ocr-d/origami/scikit-geometry
Coll…
-
Calamari 2.0 is out.
I don't see benefits from updating the dependency, other than staying uptodate/compatible.
-
The related workflows all end with CER / WER 1.0, so no text is recognized by Calamari.
A manual run for a single GT terminates in less than 1 second without error message, but also without a usabl…
-
The current implementation seems to assume `ocrd process` itself is included on the input side. But how about the options that takes?
- `--page-id`: should be part of the NF result, right?
- `--ov…
-
## Is your feature request related to a problem? Please describe.
I want to force the search service to rebuild the search index for a space / all spaces.
This can currently only be done by deleti…
-
**Area of Concern**
- [ ] Server
- [x] Behaviour of one or more Modules [provide name(s), e.g. ObjectDetectionYolo]
- [ ] Installer
- [ ] Runtime [e.g. Python3.7, .NET]
- [ ] Module packages [e.g…
-
## Current situation
When implementing processors using the **bashlib** shell library provided by OCR-D/core, developers have to write their own routines, based on tools like ImageMagick, to extract …
-
Tesseract and poppler only produce pages one by one now. When there are dozens of pages, it work slowly.
Can we increase multithreading processing capability
-
Currently, `dinglehopper` extracts text from PAGE XML files on the region level (https://github.com/qurator-spk/dinglehopper/blob/master/qurator/dinglehopper/ocr_files.py#L50). It would be wonderful i…