Potential memory leak in PaddleOCR?

nikos-livathinos commented 1 year ago

I have noticed some weird memory usage when evaluating the performance of PaddleOCR:

When PaddleOCR processes new images of a sequence, there is a constant increase in memory usage of the process.
The time profile of the memory usage has sudden steps to higher memory levels.
If the system sets an upper bound in the memory consumption, the paddle process is eventually killed. It won't give allocated memory back, ever.
This behaviour can be observed both with the C++ and the Python interface.

Eventually it is impossible to keep running a PaddleOCR process as a service because the system runs out of memory and the process is killed.

Can you provide insights on this memory usage pattern? Do you have any remedy?

The following sections describe the tests in detail.

C++ Tests

Setup for the C++ tests

Test hardware:
- OS: Ubuntu 20.04
- CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 16 threads, AVX2
- RAM: 128GB
C++ compiler: gcc v9.4.0
PaddleOCR: Source code from v2.6.0

Paddle library:

GIT COMMIT ID: 4596b9a22540fb0ea5d369c3c804544de61d03d0
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: OFF
WITH_ROCM: OFF
WITH_ASCEND_CL: OFF
WITH_ASCEND_CXX11: OFF
WITH_IPU: OFF
CXX compiler version: 8.2.0

Model files:
- Detector: en_PP-OCRv3_det_infer.tar
- Classifier: ch_ppocr_mobile_v2.0_cls_infer.tar
- Recognizer: en_PP-OCRv3_rec_infer.tar

OpenCV has been compiled from the source code tagged at v4.6.0 with the parameters:

cmake ../opencv \
    -DCMAKE_INSTALL_PREFIX=${install_dir} \
    -DCMAKE_BUILD_TYPE=Release \
    -DBUILD_SHARED_LIBS=OFF \
    -DWITH_IPP=OFF \
    -DBUILD_IPP_IW=OFF \
    -DWITH_LAPACK=OFF \
    -DWITH_EIGEN=OFF \
    -DCMAKE_INSTALL_LIBDIR=lib64 \
    -DWITH_ZLIB=ON \
    -DBUILD_ZLIB=ON \
    -DWITH_JPEG=ON \
    -DBUILD_JPEG=ON \
    -DWITH_PNG=ON \
    -DBUILD_PNG=ON \
    -DWITH_TIFF=ON \
    -DBUILD_TIFF=ON

Methodology for the C++ tests

Base code: The C++ demo app from PaddleOCR repo at v2.6.0 (https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/deploy/cpp_infer)
Slight modifications to report the memory usage (Resident Set Size), the number of bboxes/image and the total time spent after the OCR of each image.
Run on CPU with or without MKL enabled.

cmd arguments:

./${PADDLE_OCR_BUILD_DIR}/ppocr \
  --rec_char_dict_path="./dicts/en_dict.txt" \
  --det_model_dir=${MODELS_INFERENCE_DIR}/det_db \
  --rec_model_dir=${MODELS_INFERENCE_DIR}/rec_rcnn \
  --cls_model_dir=${MODELS_INFERENCE_DIR}/cls \
  --visualize=true \
  --output=${OUT_DIR} \
  --image_dir=${IMAGES_DIR} \
  --use_angle_cls=true \
  --det=true \
  --rec=true \
  --cls=true \
  --use_gpu=false \
  --enable_mkldnn=true \
  --precision=fp32 \
  --cpu-threads=4

Results of the C++ tests

Test 1: Base

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

ppocr_supplier_bboxes_dt

Test 2: Long run

CPU with MKL enabled
10560 images (1320 images repeated 8 times)
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

ppocr_supplier_x8_bboxes_dt

Test 3: No MKL

CPU with MKL disabled
1320 images
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

ppocr_supplier_bboxes_dt_nomkl

Test 4: Det only

CPU with MKL enabled
1320 images
OCR pipeline = Detector
Num of theads: 4

ppocr_supplier_det_bboxes_dt

Test 5: Det + Rec

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Recognizer
Num of theads: 4

ppocr_supplier_det_rec_bboxes_dt

Test 6: Det + Cls

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Classifier
Num of theads: 4

ppocr_supplier_det_cls_bboxes_dt

Test 7: Loop same image

CPU with MKL enabled
OCR the same image 1320 times
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

ppocr_000AVY01_1320_bboxes_dt

Python tests

Setup for the Python tests

Test hardware:
- OS: Ubuntu 20.04
- CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 16 threads, AVX2
- RAM: 128GB
paddlepaddle: v2.3.2
paddleocr: v2.6.0.1

Results of the Python tests

CPU with MKL enabled
1320 images
OCR pipeline = Detector + Classifier + Recognizer
Num of theads: 4

performance_supplier

lucky2046 commented 1 year ago

This problem also exists when using the gpu mode

lucashu1 commented 1 year ago

@nikos-livathinos Did you manage to find any workaround for this? We're encountering the same issue as you.

We're also using paddleocr==2.6.0.1 on a CPU; wondering if upgrading would help fix this.

lucashu1 commented 1 year ago

Experiment 1

For debugging, we tried an experiment in Python where we loop over a set of images, and each time, create a new PaddleOCR object and then del it immediately after.

The code is something like this:

lang = 'en'
for image_path in image_paths:
    ocr = PaddleOCR(lang=lang, show_log=False, use_angle_cls=False)
    _ = ocr.ocr(image_path, cls=False)
    del ocr

We were hoping that by deleting the PaddleOCR object each time, we could work around the memory leak issue by letting the garbage collector clear out any old memory usage after each call.

However, regardless of the language (en or other language), we get a memory usage chart that looks something like this:

DHjGhK4

(Plotted using the mprof tool from https://github.com/pythonprofilers/memory_profiler, with the --include_children flag set.)

As you can see, the leaked memory seems to increase linearly with each new PaddleOCR object that's used. The used memory never gets cleaned up, even though the old PaddleOCR objects have been "deleted" in Python.

Experiment 2

If we use Python to call the CLI, we don't get a memleak, since the process is spawned and killed immediately after the OCR call.

lang = 'en'
for image_path in image_paths:
    subprocess.check_output(
        ["paddleocr", "--image_dir", image_path, "--lang", lang]
    )

Sample results:

zMZfj8o

Experiment 3

If we don't del the PaddleOCR objects and simply use a single PaddleOCR object to iterate over the images, we get results similar to those shown by @nikos-livathinos.

lang = 'en'
ocr = PaddleOCR(lang=lang, show_log=False, use_angle_cls=False)
for image_path in image_paths:
    _ = ocr.ocr(image_path, cls=False)

Sample results: wyzLn5J

Other info

System info:

OS: CentOS Linux 7 (Core)
python: 3.9.0

-----

pip install info:

paddleocr: 2.6.0.1 (same issue occurs with 2.6.1.2)
paddlepaddle: 2.4.1

-----

lscpu output (no GPUs):

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2934.899
CPU max MHz:           3100.0000
CPU min MHz:           1200.0000
BogoMIPS:              4400.03
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts spec_ctrl intel_stibp flush_l1d

@littletomatodonkey @LDOUBLEV Let me know if there's any information I can help provide the owners/maintainers of this project to help fix this memleak issue. We're trying to deploy PaddleOCR as a service, and this memory leak is really hindering our ability to do so.

Thanks!

Siegi96 commented 1 year ago

any news on this?

Mangoboo commented 1 year ago

Any news on this? I also have the same problem running OCR with CPU/GPU with similar environment. I run through both AWS EC2 instance (t3.medium, g4dn.xlarge) and my local machine (CPU) and see memory increase infinitely.

PaddlePaddle / PaddleOCR