Memory leak in PaddleOCR.

ShubhamZoop commented 4 months ago

I have noticed some weird memory usage when evaluating the performance of PaddleOCR:

When PaddleOCR processes new images of a sequence, there is a constant increase in memory usage of the process. The time profile of the memory usage has sudden steps to higher memory levels. If the system sets an upper bound in the memory consumption, the paddle process is eventually killed. It won't give allocated memory back, ever. This behaviour can be observed both with the C++ and the Python interface. Eventually it is impossible to keep running a PaddleOCR process as a service because the system runs out of memory and the process is killed.

Can you provide insights on this memory usage pattern? Do you have any remedy?

The following sections describe the tests in detail.

C++ Tests Setup for the C++ tests Test hardware:

OS: Ubuntu 20.04 CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 16 threads, AVX2 RAM: 128GB

PaddleOCR: Source code from v2.7.0.3

Model files:

Detector: en_PP-OCRv3_det_infer.tar Classifier: ch_ppocr_mobile_v2.0_cls_infer.tar Recognizer: en_PP-OCRv3_rec_infer.tar OpenCV has been compiled from the source code tagged at v4.6.0 with the parameters:

Methodology for the Python tests PaddleOCR repo at v2.7.0.3

Slight modifications to report the memory usage (Resident Set Size), the number of bboxes/image and the total time spent after the OCR of each image.

Run on CPU with or without MKL enabled.

cmd arguments:

./${PADDLE_OCR_BUILD_DIR}/ppocr \ --rec_char_dict_path="./dicts/en_dict.txt" \ --det_model_dir=${MODELS_INFERENCE_DIR}/det_db \ --rec_model_dir=${MODELS_INFERENCE_DIR}/rec_rcnn \ --cls_model_dir=${MODELS_INFERENCE_DIR}/cls \ --visualize=true \ --output=${OUT_DIR} \ --image_dir=${IMAGES_DIR} \ --use_angle_cls=true \ --det=true \ --rec=true \ --cls=true \ --use_gpu=false \ --enable_mkldnn=true \ --precision=fp32 \ --cpu-threads=4 Results of the Python tests

Test 1: Base

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

Test 2: Long run

CPU with MKL enabled
10560 images (1320 images repeated 8 times)
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

Test 3: No MKL

CPU with MKL disabled
1320 images
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

Test 4: Det only

CPU with MKL enabled
1320 images
OCR pipeline = Detector
Num of theads: 4

Test 5: Det + Rec

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Recognizer
Num of theads: 4

Test 6: Det + Cls

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Classifier
Num of theads: 4

Test 7: Loop same image

CPU with MKL enabled
OCR the same image 1320 times
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

Python tests Setup for the Python tests Test hardware:

OS: Ubuntu 20.04 CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 16 threads, AVX2 RAM: 128GB paddlepaddle: v2. 6.0

paddleocr: v2.7.0.3

Results of the Python tests CPU with MKL enabled 1320 images OCR pipeline = Detector + Classifier + Recognizer Num of theads: 4

slelekospd commented 4 months ago

Same problem...

ShubhamZoop commented 4 months ago

@tink2123 Can you look into this issue?

vivienfanghuagood commented 3 months ago

你好，这个原因经过分析，本质上不属于内存泄漏，是Paddle框架将Tensor做了缓存复用，这部分内存在下次遇到同样shape tensor的时候会复用，从而避免调用系统的allocator。如果对内存比较敏感，可以export FLAGS_allocator_strategy=naive_best_fit，这个会一定程度上缓解CPU的内存占用。后续我们将进一步优化Allocator的逻辑，以提供更合理的内存复用策略。