-
The zip includes the model, the code and the resulting HTML.
The problem is that the get.text works well for a specific size of letters and fails on others. There is very high diversity in font size…
-
The provided model cannot correctly categorize some "vaguely" plotted Figures and Tables. In this case, the word in the Table region will be considered as normal Text, thus hinder the normal reading o…
-
YALTAi 2.0.1 hangs forever when segmenting
```
(train-2.0.1-py3.11) incognito@DESKTOP-H1BS9PO:~/YALTAi$ yaltai kraken -I "*.jpg" --suffix ".xml" segment --yolo runs/detect/train2/weights/best.pt
…
-
I will try to get a clean example of this but came across this package and wanted to give it a test. However several of the addresses it flagged from the "shared" device group are in fact in use direc…
-
Hi,
At present, I have all documents as DOCX (Microsoft Word files) which I convert to PDF in order to run the GROBID XML conversion. Is there any possibility of using DOCX as input?
In case of …
-
I build my own Mirador with textoverlay-plugin 0.3.8:
```
import Mirador from 'mirador/dist/es/src/index';
import downloadDialogPlugin from 'mirador-downloaddialog/es';
import imageCropperPlugin…
-
METS/PAGE/ALTO provided by digitization workflow software or repositories will not always adhere [to the conventions we have in OCR-D](https://ocr-d.de/en/spec). OTOH the workspaces that are the resul…
-
https://github.com/kba/page-to-alto/blob/46a8cc2fb74ce327e9d195f1095699cbae946cce/ocrd_page_to_alto/convert.py#L158
I think it's not enough to just map the lower levels here. There might not be any…
-
Hi,
I have an XML files that is failing with below error.
Error/s returned during metadata extraction (SaxParseException: java.lang.ClassCastException: class sun.net.www.protocol.file.FileURLCo…
rgalv updated
8 months ago
-
As outlined a while ago,
https://github.com/cisocrgroup/ocrd_cis/blob/c3fad1a8b04dc5a305460e8bb3c54cb79cd75515/ocrd_cis/ocropy/common.py#L111-L118
there are plenty of opportunities to improve `o…