-
lets say i want to classiy domain document into about 48 categories, am I create like The RVL-CDIP Dataset? what`s the proper dpi of document image ?should I process them into grayscale?
400,000 gray…
-
Hi, we find that GroBid cannot parse the inline formula without discarding the spacing, superscript and subscript information. Could you suggest a pathway to improve the accuracy on these scenarios?
…
-
### What were you trying to do?
I have used ocrmypdf to perform OCR on a PDF document, but I'm encountering a specific issue with RTL (right-to-left) languages like Persian. Despite successful OCR …
-
i have tested different architectures to classify company, brand, size* and model from cylindric batteries like the ones described here:
https://en.wikipedia.org/wiki/Electric_battery
Best resul…
-
hOCR is easy to implement because it's based on HTML but it can hardly be called a standard while there are living standards for OCR like ALTO.
hOCR is used by Open Source engines like tesseract, ocr…
-
### Feature request
I wonder if could be added `PyLaia` model support: [https://gitlab.teklia.com/atr/pylaia](https://gitlab.teklia.com/atr/pylaia)
`PyLaia` works very well with text detection for H…
-
Hello,
I don't know much about PDF, and am confused about *box (mediabox, cropbox, etc.) and the units used in *box and pdfCropMargins (pt vs. %).
What would be the right way to _permanently_ — …
-
## 2017
![screen shot 2017-07-08 at 9 35 56 am](https://user-images.githubusercontent.com/10191084/27987333-f2abab26-63c0-11e7-90ac-092981619a52.png)
## 2018
![blah](https://user-images.githu…
-
**Is your feature request related to a problem? Please describe.**
The challenge is efficiently extracting text from images, such as scanned documents, receipts, and handwritten notes. Users often st…
-
[forwarded from a user email, resposted with permission]
The company Channel Master has produced a video recorder named DVR+. It records over-the-air television broadcasts in the USA. It has some…