-
Hi,
Is it possible to convert page xml into searchable pdf? Although I got a pdf after using ocrd-pagetopdf -I OCR-D-OCR -O OCR-D-PDF -P textequiv_level word, it is not searchable. The workflow I u…
-
In https://github.com/OCR-D/quiver-benchmarks/issues/22, @stweil mentions 118 GB being used for newspaper pages.
- [ ] Reproduce
- [ ] Can we test for this somehow
-
The tool should display an image corresponding to the text line/OCR error selected.
I'll probably use the local images to display, not IIIF as this seems more general.
- [x] Explore ~JS~CSS poss…
-
ocrd-segment-repair has the optional operations "plausibilize" and "sanitize" – I have no idea what this exactly does :) I would prefer something like this:
* shrink-regions-to-hull-of-lines
* wha…
-
We have…
```
ocrd process \
"cis-ocropy-binarize -I DEFAULT -O OCR-D-BIN"
"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP"
"skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li"
"skimage-denois…
-
While trying to [compare GT4HistOCR model performance](https://hackmd.io/@bertsky/Skd2DPyi_) between Tesseract and Calamari, I stumbled over a few peculiarities of Calamari's (superb!) text pre/postpr…
-
Here the original image:
https://digi.ub.uni-heidelberg.de/diglitData/v/blaeu1655bd6_-_00_129.tif
here the image fed into sbb-textline (binarized etc):
https://digi.ub.uni-heidelberg.de/diglitDat…
-
The related workflows all end with CER / WER 1.0, so no text is recognized by Calamari.
A manual run for a single GT terminates in less than 1 second without error message, but also without a usabl…
-
- [x] Migrate to pyproject.toml
- [x] Remove qurator namespace
- [x] Consider using setuptools_scm
-
I'm not sure whether this is the right place to ask as `sbb-textline-detector` itself worked perfectly in our OCR-D workflows and the produced segmentation results look good as well but running any re…