Closed iprunache closed 3 years ago
Thanks @iprunache
@anjesh what is involved regarding analysing and addressing this issue?
@anjesh what is involved regarding analysing and addressing this issue?
The communication code needs to be revamped to use more modern libraries and the latest Abbyy API.
70% completed.
Why
OCR processing of uploaded contracts often gets stuck or fails because the pdf-processor service sometimes fails to get a response from the Abbyy OCR SDK. All pages for uploaded contracts should be properly processed.
What
Notes
See discussion started here: https://github.com/NRGI/resourcecontracts.org/issues/1340#issuecomment-755229710
Most of the time PDF processing gets stuck when trying to retrieve the status of an Abbyy task(no answer is ever received):
Sometimes API calls fail and there's no retry:
The processing issues seem to rise from network connectivity issues which pdf-processor does not handle that well. Ideally, it should switch to using the more modern
requests
python library and add timeouts and retries for API calls.