PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
40.2k stars 7.45k forks source link

PaddleOCR not working in a multiprocessing scenario #12609

Open subhankardori opened 1 month ago

subhankardori commented 1 month ago
# parallel.py
from plugin_inference import demo
import multiprocessing

# Get the value of n from the user
n = int(input("Enter the number of processes to initiate: "))

#Create a list to hold references to the process objects
processes = []

#Start n processes
for i in range(n):
    p = multiprocessing.Process(target=demo, args=())
    processes.append(p)
    p.start()

#Wait for all processes to complete
for p in processes:
    p.join()

print("All processes have completed.")
root@bb0c53def9d9:/code/build/data/inf_models/17# python3 parallel.py 
Enter the number of processes to initiate: 1
[2024/06/03 11:39:36] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=True, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/code/build/data/inf_models/17/1', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/code/build/data/inf_models/17/2', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/usr/local/lib/python3.10/dist-packages/paddleocr/ppocr/utils/en_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='/code/build/data/inf_models/17/3', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='en', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/code/build/data/inf_models/17/plugin_inference.py", line 762, in demo
    obj=OCR_Runner(img,args,frame_count)
  File "/code/build/data/inf_models/17/plugin_inference.py", line 203, in __init__
    self.reader=PaddleOCR(lang='en', use_angle_cls=True, use_gpu=True,
  File "/usr/local/lib/python3.10/dist-packages/paddleocr/paddleocr.py", line 616, in __init__
    super().__init__(params)
  File "/usr/local/lib/python3.10/dist-packages/paddleocr/tools/infer/predict_system.py", line 46, in __init__
    self.text_detector = predict_det.TextDetector(args)
  File "/usr/local/lib/python3.10/dist-packages/paddleocr/tools/infer/predict_det.py", line 141, in __init__
    self.predictor, self.input_tensor, self.output_tensors, self.config = utility.create_predictor(
  File "/usr/local/lib/python3.10/dist-packages/paddleocr/tools/infer/utility.py", line 280, in create_predictor
    predictor = inference.create_predictor(config)
OSError: (External) CUDA error(3), initialization error. 
  [Hint: 'cudaErrorInitializationError'. The API call failed because the CUDA driver and runtime could not be initialized. ] (at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:251)

All processes have completed.
root@bb0c53def9d9:/code/build/data/inf_models/17# 

Using NVIDIA docker equipped with CUDA and cuDNN already, a paddleocr script is running end-to-end but not when there is multiprocessing. Need urgent insight/solutions!

subhankardori commented 1 month ago

@SWHL @jzhang533 need an assignee to this issue, tagging because paddle-bot didnt assign it to anyone if any other relevant info is required, please let me know, I'll post it

GreatV commented 4 weeks ago

use_gpu and use_multiprocess cannot be true at the same time.

https://github.com/PaddlePaddle/PaddleOCR/blob/6954da712e1cb7dec913672a9a7ffee78ea2293b/deploy/hubserving/readme_en.md?plain=1#L148

subhankardori commented 4 weeks ago

@GreatV ok so you mean use_multiprocess is by default set to True, since I didn’t explicitly set it

but does this mode use_multiprocess=True allow GPU based processing ? Or simply CPU

GreatV commented 4 weeks ago

Currently, only the CPU can use multithreading, and there have been many reports from the community on this issue. You can refer: https://github.com/search?q=org%3APaddlePaddle+cudaErrorInitializationError&type=issues

subhankardori commented 4 weeks ago

so @GreatV , can I do a use_gpu=True and set the use_multiprocess=False, since I want to utilize the CUDA capabilities, what I am able to understand is that use_multiprocess is a param to facilitate threading internally in PaddleOCR