-
The current requirements.txt wants TensorFlow 2.4.x - which is not available on PyPI for Python 3.9+.
(Side note: We have been using TensorFlow >= 2.5.0 with Calmari 1.0.x for this reason.)
-
Hi, in the latest version, the Tesseract engine will use the Path set in Environment Variable instead of the path from the bundle, causing it to throw error (or OCR not working in the release version)…
-
Hello can u describe the steps more in details ? how to run and where can i find sample image ?
-
Huggingface Model: https://huggingface.co/microsoft/Phi-3.5-vision-instruct
Fine-tuned Dataset: https://huggingface.co/datasets/linxy/LaTeX_OCR
Usually, fine-tuning a multimodal large model invo…
-
ALTO should support OCR of video efficiently.
This is a future-looking issue, not something we're likely to address immediately, but something to keep in mind as we drive progress of ALTO to be a s…
-
In https://github.com/OCR-D/quiver-benchmarks/issues/22, @stweil mentions 118 GB being used for newspaper pages.
- [ ] Reproduce
- [ ] Can we test for this somehow
-
Likewise we can find so many working examples about tesseract-OCR
for example at this [link](https://pypi.org/project/pytesseract/)
-
https://github.com/kba/ocrmultieval/blob/5de79f3021b48f83f9cb798a484fd472d21ed94b/ocrmultieval/backends/OcrdSegmentEvaluate.py#L21-L23
This does not cover the case where the binary image is itself …
-
hOCR is easy to implement because it's based on HTML but it can hardly be called a standard while there are living standards for OCR like ALTO.
hOCR is used by Open Source engines like tesseract, ocr…
-
@bertsky wrote in #1:
> I still think this would make a very good addition to ocrd-segment-repair...