Calamari-OCR / calamari

Line based ATR Engine based on OCRopy
Apache License 2.0
1.04k stars 209 forks source link

Issue while using the model and json #334

Open rajban94 opened 1 year ago

rajban94 commented 1 year ago

I am getting the below error while trying to get the data using the code

file = 'out_10.jpg'
cv_img = cv2.imread(file,0)

predictor = Predictor.from_checkpoint(
        params=PredictorParams(),
        checkpoint='./models/cal_model.ckpt')

calamari_output = {}
for sample in predictor.predict_raw(cv_img):
    inputs, prediction, meta = sample.inputs, sample.outputs, sample.meta

    pred_text = prediction.sentence
    avg_char_probability = 0
    for p in prediction.positions:
        if len(p.chars) > 0:
            avg_char_probability += p.chars[0].probability
    avg_char_probability /= len(prediction.positions) if len(prediction.positions) > 0 else 1
    #print(prediction.avg_char_probability)
    pred_confidence = round(avg_char_probability * 100, 1)
    calamari_output['name'] = [pred_text, pred_confidence]

Error:::
2023-01-23 18:20:15.833943: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-01-23 18:20:15.834081: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow_addons\utils\ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.9.0 and strictly below 2.12.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.6.5 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  UserWarning,
INFO     2023-01-23 18:20:17,488     tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:17.494646: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2023-01-23 18:20:17.494742: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-23 18:20:17.497412: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-U30QQNA8
2023-01-23 18:20:17.497586: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-U30QQNA8
INFO     2023-01-23 18:20:17,495 calamari_ocr.ocr.savedmodel.sa: Checkpoint version 5 is up-to-date.
INFO     2023-01-23 18:20:17,519     tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:17.530284: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING  2023-01-23 18:20:18,624                     tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING  2023-01-23 18:20:18,624                     tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Prediction:   0%|                                                                                                                                                                         | 0/262 [00:00<?, ?it/s]2023-01-23 18:20:18.929519: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2023-01-23 18:20:19.885121: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-01-23 18:20:19.885257: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow_addons\utils\ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.9.0 and strictly below 2.12.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.6.5 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  UserWarning,
INFO     2023-01-23 18:20:21,432     tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:21.435367: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2023-01-23 18:20:21.435484: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-23 18:20:21.438304: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-U30QQNA8
2023-01-23 18:20:21.438477: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-U30QQNA8
INFO     2023-01-23 18:20:21,436 calamari_ocr.ocr.savedmodel.sa: Checkpoint version 5 is up-to-date.
INFO     2023-01-23 18:20:21,456     tfaip.device.device_config: Setting up device config DeviceConfigParams(gpus=None, gpu_auto_tune=False, gpu_memory=None, soft_device_placement=True, dist_strategy=<DistributionStrategy.DEFAULT: 'default'>)
2023-01-23 18:20:21.516634: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING  2023-01-23 18:20:22,690                     tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING  2023-01-23 18:20:22,690                     tensorflow: No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Prediction:   0%|                                                                                                                                                                         | 0/262 [00:00<?, ?it/s]2023-01-23 18:20:23.067125: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2023-01-23 18:20:23.460350: W tensorflow/core/framework/op_kernel.cc:1680] Unknown: RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
    ret = func(*args)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
    for s in samples:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
    for s in generate:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
    with parallel_pipeline as output_generator:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
    maxtasksperchild=self.max_tasks_per_child,

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
    super().__init__(initializer=Initializer(worker_constructor), **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Prediction:   0%|                                                                                                                                                                         | 0/262 [00:00<?, ?it/s]
CRITICAL 2023-01-23 18:20:23,456             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\RISHAV\Documents\ML_Flow\extraction.py", line 86, in <module>
    for sample in predictor.predict_raw(cv_img):
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 215, in predict_pipeline
    total=n_samples,
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 84, in _apply
    for sample in samples:
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 279, in predict_dataset
    r = predict_function(iterator)  # hack to access inputs
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 957, in _call
    filtered_flat_args, self._concrete_stateful_fn.captured_inputs)  # pylint: disable=protected-access
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 596, in call
    ctx=ctx)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:  RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
    ret = func(*args)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
    for s in samples:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
    for s in generate:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
    with parallel_pipeline as output_generator:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
    maxtasksperchild=self.max_tasks_per_child,

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
    super().__init__(initializer=Initializer(worker_constructor), **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

         [[{{node PyFunc}}]]
         [[IteratorGetNext]] [Op:__inference_predict_function_2234]

Function call stack:
predict_function

2023-01-23 18:20:23.484051: W tensorflow/core/framework/op_kernel.cc:1680] Unknown: BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
    ret = func(*args)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
    for s in samples:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
    for s in generate:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
    with parallel_pipeline as output_generator:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
    maxtasksperchild=self.max_tasks_per_child,

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
    super().__init__(initializer=Initializer(worker_constructor), **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

Prediction:   0%|                                                                                                                                                                         | 0/262 [00:04<?, ?it/s]
CRITICAL 2023-01-23 18:20:23,487             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "extraction.py", line 86, in <module>
    for sample in predictor.predict_raw(cv_img):
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 215, in predict_pipeline
    total=n_samples,
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 84, in _apply
    for sample in samples:
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\predict\predictorbase.py", line 279, in predict_dataset
    r = predict_function(iterator)  # hack to access inputs
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\def_function.py", line 957, in _call
    filtered_flat_args, self._concrete_stateful_fn.captured_inputs)  # pylint: disable=protected-access
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\function.py", line 596, in call
    ctx=ctx)
  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:  BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__
    ret = func(*args)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 164, in generator
    for s in samples:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\runningdatapipeline.py", line 214, in _generate_input_samples
    for s in generate:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\data\pipeline\processor\sample\processorpipeline.py", line 114, in _apply
    with parallel_pipeline as output_generator:

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pipeline.py", line 66, in __enter__
    maxtasksperchild=self.max_tasks_per_child,

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\site-packages\tfaip\util\multiprocessing\data\pool.py", line 62, in __init__
    super().__init__(initializer=Initializer(worker_constructor), **kwargs)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Users\RISHAV\.conda\envs\ocr_env\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

         [[{{node PyFunc}}]]
         [[IteratorGetNext]] [Op:__inference_predict_function_2234]

Function call stack:
predict_function

Please help me on how to resolve this issue.

Edit andbue: code formatting

andbue commented 1 year ago

predictor.predict_raw expects Iterable[np.ndarray], you are providing only numpy.ndarray. Try:

raw_image_generator = [cv_img]
for sample in predictor.predict_raw(raw_image_generator):
    ...
rajban94 commented 1 year ago

I will try and let you know. But i have already converted the image as cv image which is a numpy.ndarray, then why i am getting this error. Is it because of tfaip?

rajban94 commented 1 year ago

@andbue i am still getting the same error after changing the image as numpy.ndarray. can you let me know what is the error that i am getting?

andbue commented 1 year ago

The image has been numpy.ndarray before, now it should be a list or any kind of iterator before you put it in the predictor. Could you post a full example of your code as it looks right now, ideally with all imports and maybe even the out_10.jpg you're using?

rajban94 commented 1 year ago

@andbue i am sharing the code which i am using for end to end prediction. Please let me know where am i going wrong.

import cv2
import numpy as np
import os
import glob
from pdf2image import convert_from_path
import subprocess
import pandas as pd
import re
from calamari_ocr.ocr.predict.predictor import Predictor, PredictorParams

def generateImage(pdfFile, des = './images'):

    if pdfFile.split('.')[-1]=='pdf':
        name = os.path.basename(pdfFile).replace('.pdf','')
        images = convert_from_path(pdfFile,dpi=500,poppler_path = "C:\\Program Files (x86)\\poppler-0.68.0\\bin")
        for i in range(len(images)):
            images[i].save(des+'/'+name+'_page_'+ str(i) +'.jpg', 'JPEG')

def get_calamari_output(cropImg, index):

    predictor = Predictor.from_checkpoint(
        params=PredictorParams(),
        checkpoint='./models/cal_model.ckpt')

    calamari_output = {}
    for sample in predictor.predict_raw([cropImg]):
        inputs, prediction, meta = sample.inputs, sample.outputs, sample.meta

        pred_text = prediction.sentence
        avg_char_probability = 0
        for p in prediction.positions:
            if len(p.chars) > 0:
                avg_char_probability += p.chars[0].probability
        avg_char_probability /= len(prediction.positions) if len(prediction.positions) > 0 else 1
        #print(prediction.avg_char_probability)
        pred_confidence = round(avg_char_probability * 100, 1)
        calamari_output[index] = [pred_text, pred_confidence]
    return calamari_output

def drawBoundBox(imageFile):

    orig_img = cv2.imread(imageFile)
    gray = cv2.cvtColor(orig_img, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray,(11,11),0)
    _, thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
    kernal = cv2.getStructuringElement(cv2.MORPH_RECT,(11,19))
    dilate = cv2.dilate(thresh,kernal, iterations=9)

    cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts)==2 else cnts[1]
    cnts = sorted(cnts, key=lambda x: cv2.boundingRect(x)[0])

    boxes = []
    for c in cnts:
        x,y,w,h = cv2.boundingRect(c)
        boxes.append([x,y,w,h])

    return boxes

def get_crops_dtls(imageFile):
    res = cv2.imread(imageFile)
    boxlist = drawBoundBox(imageFile)
    for idx, box in enumerate(boxlist):
        x,y,w,h = box[0],box[1],box[2],box[3]
        crop = res[y:y+h,x:x+w]
        #cv2.imwrite(dst+"/"+"out_"+str(idx)+'.jpg',crop)
        pred_text_dict = get_calamari_output(crop, idx)
    return pred_text_dict

if not os.path.exists('./images'):
    os.makedirs('./images')

files = glob.glob('./invoice/*')
for file in files:
    generateImage(file)

imgs = glob.glob('./images/*.jpg')
for img in imgs:
    predict_data = get_crops_dtls(img)

As suggested i have done: for sample in predictor.predict_raw([cropImg]) but still it's giving the same error as before.

andbue commented 1 year ago

Ah, now I get it: put the lines at the bottom in a if __name__ == "__main__":-block, otherwise the whole subprocess magic of calamari, tfaip and tensorflow is not going to work, producing the Broken pipe errors.

Further suggestions:

rajban94 commented 1 year ago

@andbue thank you so much for your help. It worked for me with if __name__=="__main__": But i am facing another issue i.e, if the crop image have only one line it's extracting the text correctly but whenever it's having multiple lines it's giving blank string as output. Any suggestion to resolve this without re-training the existing model? Thank you in advance.

andbue commented 1 year ago

Glad to hear that it worked for you!

Calamari is, as stated in the "About"-text, a "Line based ATR Engine", so it does not contain any code for image preprocessing, document analysis, or line segmentation. To segment paragraph blocks into lines, have a look at the ocropy segmenter I linked to earlier. A more complex alternative that also performs document layout analysis can be found at https://github.com/qurator-spk/eynollah.