-
@funderburkjim @Andhrabharati the work is started to be noticed! And so I can a question if we can batch get an OCR of the scans on our end with https://ocr.sanskritdictionary.com and with a little he…
-
Currently there is no way of distinguishing hard and soft `HYP` elements.
Example of a hard hyphen:
```
I separated the words by a non-
breaking space.
```
Example of a soft hyphen:
```
I …
-
```
(env-py3.10) incognito@DESKTOP-H1BS9PO:~/conformer_ocr$ cocr -d cuda train -f binary --workers 8 dataset.arrow
Usage: cocr train [OPTIONS] [GROUND_TRUTH]...
Try 'cocr train --help' for help.
…
-
* KOReader version: v2023.10-55
* Device: Kindle Scribe (5.16.2)
#### Issue
Enabling Forced OCR results in very wrong coordinates when long tapping to highlight text.
Supporting my observations …
-
In working with your source code. I have implemented an inference on the entire subset (train or val or test) at [here](https://github.com/tiennvcs/docvqa/blob/main/libs/layoutlmv2/inference.py).
…
-
### Is your feature request related to a problem? Please describe.
The `contentRecog` module is currently only used by the OCR and so some of the code is written only to that end. The OCR presents it…
-
I presume training on HDF5 will be more efficient than any of the other formats. And at least against the line GT file pairs, filesystem performance might be much better, too.
So my question is: ho…
-
**Is your feature request related to a problem? Please describe.**
NAPS2 is great - it's very useful to me and many others, and a big part of the utility it offers is the integrated OCR function, sin…
-
I just have tried to setup a new `ocrd_all` using Release 2023-06-14 (maybe plus some additional changes as of today).
I do a native setup on non-GPU environment (Ubuntu 22.04. using Python 3.8 via "…
-
I'm fired up about a rust implemented document parsing / embedding engine for my code and documents. Sadly, I don't see a good PDF ingestion in the code.
Ideally, I'd like to import PDFs from acad…