-
**Describe the bug**
Microsoft OCR models are now giving very bad results (hallucinating) as seen in this gif from https://github.com/CDCgov/ReportVision/pull/316:
![Kapture 2024-10-11 at 13 48 …
-
Sometimes a word on parameter choices would be helpful. For example,
- `threshold` (ocrd-cis-ocropy-binarize) or `k` (ocrd-olena-binarize) parameter for binarization,
- `maxskew` (`ocrd-cis-ocropy…
-
Running `make all` for `ocrd_all` or `pip install .` for `ocrd_segment` fails on MacOS with Homebrew:
```
Compiling pycocotools/_mask.pyx because it changed.
[1/1] Cythonizing pycocot…
-
One of the most inherently difficult OCR tasks is segmenting a String into Glyphs. Because of ink or wearing problems, two glyphs can be merged on the page without any separating white space, or a sin…
-
ocrd-segment-repair has the optional operations "plausibilize" and "sanitize" – I have no idea what this exactly does :) I would prefer something like this:
* shrink-regions-to-hull-of-lines
* wha…
-
Optimize openrecall to consume as little resources as possible. Take snapshots only when something changes on the screen not when it is idle. Consume as little battery, cpu as possible, make it effici…
-
I'm running `ocrd-cis-postcorrect` on the aligned OCR-output of Calamari and Tesserocr. So far, the output seems to be completely identical with the input even though there are quite some differences …
-
CCExtractor version: 0.94 built on macos 13.5.2
# In raising this issue, I confirm the following:
- [ x] I have read and understood the [contributors guide](https://github.com/CCExtractor/c…
-
AFAICS, the existing implementations for all versions of PAGE-XML ignore `(OrderedGroup|OrderedGroupIndexed)/@index` when parsing the XML.
This is how it looks:
https://github.com/PRImA-Researc…
-
For implementations using the bashlib API, it would be useful to have a command to extract a specific block/line from a PAGE, e.g.
```
ocrd workspace extract --element-id="line123" --page-id="page…