-
The memory usage of `ocrd-tesserocr-segment-region` increases for each page, resulting in a total of about 7 GB for 200 pages, 8 GB for 248 pages, 10 GB for 282 pages, 11 GB for 313 pages (observed fo…
-
Hello,
I hope this is the right place for my question..
I've got huge lists of part assignments I plan to import into a database. These lists are placed on microfiches, so that I had to scan them …
-
==22257== Conditional jump or move depends on uninitialised value(s)
==22257== at 0x428EC5: compare_rect_by_ypos (ocr.c:705)
==22257== by 0x454991: shell_sort (ccx_encoders_helpers.c:459)
==2…
-
The OCR-D processor silently overwrites existing annotations in the PAGE XML. It should be made either made clear via documentation and/or logging that this happens or maybe the processor should refus…
wrznr updated
4 years ago
-
Please debug your ocrd_tool.json file.
I found an error:
``` xml
[tools.ocrd-segment-evaluate] 'output_file_grp' is a required property
```
You can find the ocrd-tool.json documentation…
-
I adapted gt-binarize-page-olena-sauvola-clip-resegment-dewarp-ocr-ocropy-tesseract.mk by renaming the input file group and exchanging the first processor olena-binarize with cis-ocropy-binarize (see …
-
First stage : I am dealing with Gallica OCRs and importing raw text from urls **(I dont want to work with txt files**)
library(htm2txt) # a usefull package to import raw text from an html…
-
Can we use or modify this code for typed text segmentation.
-
In https://github.com/mjenckel/LAYoutERkennung/blob/master/ocrd_anybaseocr/ocrd-tool.json the parameter `parallel` with the description *numbers of CPUs to us* defaults to 0. Is this intended? What do…
wrznr updated
4 years ago
-
Hi, is it possible for me to fine-tune the model with a custom dataset after training it on syth dataset?