JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.76k stars 3.11k forks source link

Not been able to run easyocr in spark udf #764

Open patilauminfi opened 2 years ago

patilauminfi commented 2 years ago

Hi, I am trying to run the easyocr in spark udf function. (Actually there is also another pandas apply function in that udf function). When i run the script I get following error.


Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/tmp/ipykernel_4537/530021952.py", line 107, in get_parsed_output
  File "/home/centos/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 8839, in apply
    return op.apply().__finalize__(self, method="apply")
  File "/home/centos/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 727, in apply
    return self.apply_standard()
  File "/home/centos/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 851, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/home/centos/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 871, in apply_series_generator
    results[i] = results[i].copy(deep=False)
  File "/home/centos/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 138, in f
    return func(x, *args, **kwargs)
  File "/tmp/ipykernel_4537/1144331222.py", line 32, in crop_save
  File "/home/centos/.local/lib/python3.8/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/home/centos/.local/lib/python3.8/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/home/centos/.local/lib/python3.8/site-packages/easyocr/recognition.py", line 206, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/home/centos/.local/lib/python3.8/site-packages/easyocr/recognition.py", line 101, in recognizer_predict
    model.eval()
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1751, in eval
    return self.train(False)
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1732, in train
    module.train(mode)
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1732, in train
    module.train(mode)
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1732, in train
    module.train(mode)
  [Previous line repeated 1 more time]
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1731, in train
    for module in self.children():
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1618, in children
    for name, module in self.named_children():
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1636, in named_children
    for name, module in self._modules.items():
  File "/home/centos/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LinearPackedParams' object has no attribute '_modules'

I am using Centos 7 Python 3.8 torch version - 1.11.0+cu102 torchvision version - 0.12.0+cu102

Can anyone pls help?

s39674 commented 2 years ago

Hi @patilauminfi Do you have any reproducing code that I can run?

patilauminfi commented 2 years ago

Hi @s39674 When I run following function in spark udf function it throws this error:

import easyocr
from io import BytesIO
reader = easyocr.Reader(['en'])

def crop_save(img):

    txt=reader.readtext(img)
    return txt

You can take following file for reference.

U05A11P10 pdf_12_13 `

zba18 commented 2 years ago

I get the same error

kristenfed commented 1 year ago

Hi @patilauminfi I recently faced the same problem. Only when i set parameter quantize=False for Reader, it worked for me.