JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
24.6k stars 3.17k forks source link

FileNotFoundError in `download_and_unzip` when running multiple easyocr's concurrently #1335

Open starpit opened 4 days ago

starpit commented 4 days ago

When we try to run two or more easyocr's concurrently, we get an error in the downloader. I am guessing that the download logic uses a fixed download filepath?

EasyOcrModel(
File ".../lib/python3.10/site-packages/docling/models self.reader = easyocr.Reader(config["lang"])
File ".../lib/python3.10/site-packages/easyocr/easyocr.py", line 92, in __init__
  detector_path = self.getDetectorPath(detect_network)
File ".../lib/python3.10/site-packages/easyocr/easyocr.py", line 253, in getDetectorPath
  download_and_unzip(self.detection_models[self.detect_network]['url'], self.detection_models[self.detect_network]['filename'], self.model_storage_directory, self.verbose)
File ".../lib/python3.10/site-packages/easyocr/utils.py", line 631, in download_and_unzip
  os.remove(zip_path)
FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/.EasyOCR//model/temp.zip'
starpit commented 3 days ago

Update: by adding an fcntl file lock wrapper around the DocumentConverter constructor, we can skirt this race condition. Which seems like fair albeit not definitive evidence that it is indeed a race condition on the easyocr side.

https://github.com/IBM/lunchpail/pull/553/files#diff-887bd71eba07d3802a0d252334cc69f2ee9e74ac50e28a220dea8d9584ab6f44L130-R141