Open Bowriverstudio opened 5 years ago
Thanks for proposing this change. That'd require a lot of changes though and because we are using Tika to do the extraction, I think that this change would have more sense to be added there. @tballison might tell.
Hello and thanks for the suggestion.
I think you are correct and it seems like I'm not the first one to request this information.
https://stackoverflow.com/questions/51767916/how-to-configure-google-vision-api-with-tika-parser
Looks like I just need to do this: https://cwiki.apache.org/confluence/display/tika/TikaOCR
If you have any additional suggestions please let me know.
Thanks again.
Is your feature request related to a problem? Please describe.
Tesseract does not handle the PDF's I'd like to OCR strong enough.
Describe the solution you'd like
I want to be able to use an external API such as:
https://aws.amazon.com/textract https://aws.amazon.com/rekognition/ https://cloud.google.com/vision/docs/ocr https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recognizing-text
Describe alternatives you've considered
I am willing to hire a developer to build this feature if it is not included already. If I do hire someone, I'd like to give it to this community.