PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.55k stars 7.85k forks source link

Processing time for paddleocr with multiprocessing #14234

Open FRAki73 opened 1 week ago

FRAki73 commented 1 week ago

I measured the processing time for OCR by executing the following code. The execution results are as follows, and the processing time is about 5 times longer when executed with multiprocessing. I would like to know why the time is getting longer and what to do about it. Can someone please help me?

Processing time of OCR: 1.1000 [sec] OCR Result: The difficult thing in the life is Processing time of OCR: 5.5675 [sec] OCR Result: The difficult thing in the life is

import time
import multiprocessing
from multiprocessing import Process

#paddle OCR
from paddleocr import PaddleOCR

def normalEntry():
    processes = []
    p = Process(target=OCR_runnable, args=())
    processes.append(p)
    p.start()

    for process in processes:
        process.join()

def OCR_runnable():

    ocr = PaddleOCR(use_angle_cls=False, lang='en', show_log=False)

    start_time = time.time()
    result = ocr.ocr("./test.png", cls=False)
    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Processing time of OCR: {elapsed_time:.4f} [sec]")
    print("OCR Result: " + result[0][0][1][0])

if __name__ == '__main__':
    multiprocessing.freeze_support()
    OCR_runnable()
    normalEntry()

test

Originally posted by @FRAki73 in https://github.com/PaddlePaddle/PaddleOCR/discussions/14221

freemedom commented 4 days ago

只看到这个代码了输出了一次时间

FRAki73 commented 2 days ago

只看到这个代码了输出了一次时间

The time is displayed when OCR_runnable() is called. OCR_runnable() is called once directly and then called in another core by Multiprocessing. Therefore, the time should appear twice. There may be some other problem.

freemedom commented 2 days ago

试一下ocr多张图片?(ocr = PaddleOCR(use_angle_cls=False, lang='en', show_log=False)只需运行一次 此外,试一下show_log=True,会显示三个阶段各自的时间。

FRAki73 commented 2 days ago

Thank you for your advice. I have changed the parameter show_log=True from False, and the following is check result. The processing time of rec_res is 5 times longer than that of a direct call. Why is there a difference in processing time when there should be no data sharing between cores?

・Directly call [2024/11/25 11:05:48] ppocr DEBUG: dt_boxes num : 4, elapsed : 0.2573506832122803 [2024/11/25 11:05:49] ppocr DEBUG: rec_res num : 4, elapsed : 0.863243579864502 Processing time of OCR: 1.1323 [sec]

・Called on multiprocessing [2024/11/25 11:06:08] ppocr DEBUG: dt_boxes num : 4, elapsed : 0.25519514083862305 [2024/11/25 11:06:13] ppocr DEBUG: rec_res num : 4, elapsed : 5.2654759883880615 Processing time of OCR: 5.5323 [sec]

freemedom commented 1 day ago

奇怪,这我就不清楚了。