-
I wonder if it is possible to save the output of `pdfalto` as an XML ALTO file, and parse it later with GROBID since it is indeed the internal process/steps.
```
pdfalto input.pdf alto.xml
curl…
-
- [x] use eScriptorium API for getting doc & parts as xml
- [x] download model, create new model
- [x] write methods to download training data
- [x] correct region block data in output
- [x] run…
-
## Describe the bug
When using the paloaltonetworks.panos.panos_op module to create a CSR request with XML API commands, it does not show as pending in the GUI and places the CSR in vsys instead of…
-
I created PAGE XML with `ocrd-tesserocr-recognize -I DEFAULT -O PAGE_GERMAN_PRINT -P segmentation_level region -P textequiv_level word -P find_tables true -P model german_print`. Then I wanted to tran…
-
Při ruční selekci textu u titulu, který má ocr.txt, ale nemá ALTO.xml (což jsou všechny importované z K3) klient vypíše:
" Ke stránce není vygenerovaný textový přepis".
Asi by bylo lepší říct ně…
-
V rámci projektu NAKI Smart digilinka byla zjištěna potřeba zápisu informace o jazyku dokumentu na úrovni strany. ALTO XML umožňuje využití atributů LANG a OTHERLANGS pro úroveň PageType od verze 4.4 …
-
I have cloned the repository, successfully compiled the pdfalto tool as instructed in the readme and processed a pdf file to get a few files as output, including an xml that appears to be an alto xml …
-
Hi Rutger and other people of the Loghi-community,
Thank you for your great work on Loghi and the underlying set of tooling.
This post is not really an issue, but more of a question. We're maki…
-
### Terraform Core Version
v1.1.2
### AWS Provider Version
5.6.2
### Affected Resource(s)
aws_s3
### Expected Behavior
All object older than once day should get deleted which are …
-
On face-2-face conference in Vienna the idea came up to generate a conversion between PAGE and ALTO as best-practice mapping between the different standard objects.
If feasible, a transformation coul…