Open vinayak-mehta opened 5 years ago
Hi, is there any update about the OCR support?
I hope to do an experiment soon with https://github.com/JaidedAI/EasyOCR.
I was able to get nice results on some images with EasyOCR: https://vinayak.io/2020/09/20/day-29-easyocr-dabblements/ I might try working on a PR to integrate it with the code I mention in the first comment on this issue.
If camelot can offer an entry function that receives a list of words with their bounding boxes coordinates, it will facilitate the integration of any OCR tool that delivers these info, like Tesseract or EasyOCR, others as well.
pdfminer parsing of an OCR PDF like one produced with OCRmyPDF, merges columns frequently, even when you see the column cells very apart in the OCR PDF.
If camelot can offer an entry function that receives a list of words with their bounding boxes coordinates
@javiqm12 You can specify table areas and regions with camelot right now, are you referring to another way to provide bounding box coordinates?
The experimental version exists before this commit 9753889. It uses Tesseract (using pyocr). ocropy looked promising the last time I checked, opening this issue for discussion and experiments around OCR.