PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.91k stars 7.8k forks source link

mobile model performance issues #1922

Closed Justus-Jonas closed 3 years ago

Justus-Jonas commented 3 years ago

Dear all,

When I run the model on a CPU machine ( I don’t have a GPU cloud instance) the German mobile model requires for simple predictions (no training) a very large amount of memory which grows over time with every new document (image) (up to more than 20 GB RAM for the mobile models) and gets very slow on the prediction progress. I am using the python package loading the model with the following code:

ocr = PaddleOCR(use_angle_cls=True, lang='german')

Any idea why it requires so much RAM and gets so slow?

Thanks a lot in advance!

Best,

Justus

LDOUBLEV commented 3 years ago

Hi, Justus. Emmm, thanks for your feedback. There is indeed a problem of high memory usage on CPU prediction. However, the continuous growth of memory should be a memory leak problem.

Can you provide us with your environment, such as the CPU used, the model used, and the paddle version, so that we can further troubleshoot the problem

BTW: When you use the following instructions to predict ocr = PaddleOCR(use_angle_cls=True, lang='german'), you actually use three models, namely the text detection model, the text recognition model and the text direction classifier model.

Justus-Jonas commented 3 years ago

Hi @LDOUBLEV ,

thanks a lot for your help! My paddle service is running in a docker container on 2 cores with 25 GB RAM ( the machine itself has 8 cores and 64 GB RAM). However, changing the number of cores didn't really change the performance. The CPU is an Intel Xeon Gold 6161 with 2.20 GHz.

Our model is: Namespace(cls_batch_num=30, cls_image_shape='3, 48, 192', cls_model_dir='C:\\Users\\justu_4gsyw6g/.paddleocr/2.0/cls', cls_thresh=0.9, det=True, det_algorithm='DB', det_db_box_thresh=0.5, det_db_thresh=0.3, det_db_unclip_ratio=2.0, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='C:\\Users\\justu_4gsyw6g/.paddleocr/2.0/det', drop_score=0.5, enable_mkldnn=False, gpu_mem=8000, image_dir='', ir_optim=True, label_list=['0', '180'], lang='german', max_text_length=25, rec=True, rec_algorithm='CRNN', rec_batch_num=30, rec_char_dict_path='./ppocr/utils/dict/german_dict.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='C:\\Users\\justu_4gsyw6g/.paddleocr/2.0/rec/german', use_angle_cls=True, use_gpu=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_zero_copy_run=False)

Paddle version: paddleocr==2.0.2 paddlepaddle==2.0.0rc1

I am aware that the actual process consists of those three models, but aren't they necessary to get the positional data of the strings as well as the strings in the image themselves?

Regarding the memory leak, do you have an idea of what we could try out? Are there any parameters we could change to make the model use less RAM and perform faster?

Thanks a lot for your help @LDOUBLEV! If you want us to test something, feel free to reach out to me.

Best,

Justus

jasonf7 commented 3 years ago

Hi all,

I am experiencing the same problem (on CPU). Here's my information:

Package version
paddlepaddle        2.0.0
PaddleOCR repo (HEAD 895d44bc3941810ca974f103fe27bd383570e241)

Macbook Pro
CPU 2.9 GHz Dual-Core Intel Core i5
8 GB 1867 MHz DDR3

Models
paddle_args = '--det_model_dir=/Users/jason/Downloads/ch_ppocr_server_v2.0_det_infer ' + \
              '--rec_model_dir=/Users/jason/Downloads/ch_ppocr_server_v2.0_rec_infer ' + \
              '--cls_model_dir=/Users/jason/Downloads/ch_ppocr_mobile_v2.0_cls_infer ' + \
              '--rec_char_dict_path=/Users/jason/Development/PaddleOCR/ppocr/utils/ppocr_keys_v1.txt ' + \
              '--use_angle_cls=True --use_space_char=True --use_gpu=False'

For me, after 20 images my RAM would increase by 1GB. @Justus-Jonas as a workaround, I tried re-initializing the OCR module for every inference. It adds a little bit of overhead but now after 100+ images the RAM stays relatively constant (can't say for sure that the memory leak is gone though!).

Update: I'm also noticing around 3-5x speedup if I'm using PaddleOCR from source (rather from pip), not sure why that is...

paddle-bot-old[bot] commented 3 years ago

Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。