Memory leaks when doing CPU inference with MKLDNN

cryoco commented 4 years ago

System information Paddle version: 1.8.2 Paddle With CUDA: False OS: Ubuntu 16.04 CPU: 16 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Python version: 3.5.2 CUDA version: 9.0.176 cuDNN version: None.None.None Nvidia driver version: None API information: inference configuration

config.disable_gpu()
config.enable_mkldnn()
config.set_cpu_math_library_num_threads(4)

To Reproduce

download models and data
- ocr detection model
- ocr recognition model
- test data The test data pictures are binary files saved with np.save, and can be loaded with np.load.
perform inference with mkldnn code to reproduce: test_ocr_mkldnn_mem.txt(rename to test_ocr_mkldnn_mem.py)

detection

python3 test_ocr_mkldnn_mem.py --model_file=./ch_det_mv3_db/model --params_file=./ch_det_mv3_db/params --mode=det --mkldnn

recognition

python3 test_ocr_mkldnn_mem.py --model_file=./ch_rec_mv3_crnn/model --params_file=./ch_rec_mv3_crnn/params --mode=rec --mkldnn

Memory usage increasing can be witnessed with top command.

perform inference without mkldnn

detection

python3 test_ocr_mkldnn_mem.py --model_file=./ch_det_mv3_db/model --params_file=./ch_det_mv3_db/params --mode=det

recognition

python3 test_ocr_mkldnn_mem.py --model_file=./ch_rec_mv3_crnn/model --params_file=./ch_rec_mv3_crnn/params --mode=rec

Memory usage will maintain stable.

The input shapes we use in OCR are dynamic, which might be relevant to this issue.

luotao1 commented 4 years ago

Please use SetMkldnnCacheCapacity interface. https://github.com/PaddlePaddle/Paddle/blob/126d3d693b0b5ebdbef5c6d315bad86b701ebcea/paddle/fluid/inference/api/paddle_analysis_config.h#L348-L355

It is used in paddle/fluid/inference/tests/api/analyzer_detect_tester.cc, which is dynamic shape as well.
This interface only has C++/C API, you may wrap a python API for it.

cryoco commented 4 years ago

Please use SetMkldnnCacheCapacity interface. https://github.com/PaddlePaddle/Paddle/blob/126d3d693b0b5ebdbef5c6d315bad86b701ebcea/paddle/fluid/inference/api/paddle_analysis_config.h#L348-L355

It is used in paddle/fluid/inference/tests/api/analyzer_detect_tester.cc, which is dynamic shape as well.

This interface only has C++/C API, you may wrap a python API for it.

Tried setting MKLDNN cache capacity to 10/20/100/1024, but still got the memory leak :(

cryoco commented 4 years ago

Added python api set_mkldnn_cache_capacity in PR#25524, which might be useful in debugging and solving this issue.

wojtuss commented 4 years ago

@cryoco I got the same error as in https://github.com/PaddlePaddle/Paddle/issues/25507#issuecomment-658228096. Is another model required here as well?

luotao1 commented 4 years ago

This issue use the same model as #25507. And https://github.com/PaddlePaddle/Paddle/issues/25507#issuecomment-658636558 updates the model link.

wojtuss commented 4 years ago

@luotao1 , @cryoco I took the model from https://github.com/PaddlePaddle/Paddle/issues/25507#issuecomment-658636558 and got the following error with the script test_ocr_mkldnn_mem.py (with or without the --mkldnn option; this error does not occur when running the script test_ocr_mkldnn_diff.py from the issue https://github.com/PaddlePaddle/Paddle/issues/25507 on the same model) :

$ python test_ocr_mkldnn_mem.py --model_file=ch_det_mv3_db/model --params_file=ch_det_mv3_db/params
...
I0715 09:04:47.246189 30616 naive_executor.cc:95] ---  skip [feed], feed -> image
I0715 09:04:47.247287 30616 naive_executor.cc:95] ---  skip [concat_1.tmp_0], fetch -> fetch
Traceback (most recent call last):
  File "test_ocr_mkldnn_mem.py", line 88, in <module>
    result = run(pred)
  File "test_ocr_mkldnn_mem.py", line 42, in run
    predictor.zero_copy_run()
paddle.fluid.core_avx.EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::operators::GetBroadcastDimsArrays(paddle::framework::DDim const&, paddle::framework::DDim const&, int*, int*, int*, int, int)
3   paddle::operators::ElementwiseOp::InferShape(paddle::framework::InferShapeContext*) const
4   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
5   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
6   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
7   paddle::framework::NaiveExecutor::Run()
8   paddle::AnalysisPredictor::ZeroCopyRun()

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/root/miniconda3/envs/ocrpy36/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2525, in append_op
    attrs=kwargs.get("attrs", None))
  File "/root/miniconda3/envs/ocrpy36/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/root/miniconda3/envs/ocrpy36/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 10348, in _elementwise_op
    'use_mkldnn': use_mkldnn})
  File "/root/miniconda3/envs/ocrpy36/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 10521, in elementwise_add
    return _elementwise_op(LayerHelper('elementwise_add', **locals()))
  File "/paddle/PaddleOCR/github/PaddleOCR/ppocr/modeling/heads/det_db_head.py", line 159, in __call__
    input=out4, scale=2), y=in3)  # 1/8
  File "/paddle/PaddleOCR/github/PaddleOCR/ppocr/modeling/architectures/det_model.py", line 112, in __call__
    predicts = self.head(conv_feas)
  File "/paddle/PaddleOCR/github/PaddleOCR/tools/program.py", line 193, in build_export
    image, outputs = model(mode='export')
  File "tools/export_model.py", line 67, in main
    config, eval_program, startup_prog)
  File "tools/export_model.py", line 93, in <module>
    main()

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 96, 4, 56] and the shape of Y = [1, 96, 4, 55]. Received [56] in X is not equal to [55] in Y at i:3.
  [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] at (/paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:157)
  [operator < elementwise_add > error]

Also, there is no --mode option available for the test_ocr_mkldnn_mem.py script.

cryoco commented 4 years ago

@wojtuss My apologize. detection model recognition model code: test_ocr_mkldnn_mem.txt

wojtuss commented 4 years ago

@cryoco Thank you! I reproduced the result and can confirm that MKLDNN cache is growing during the test execution. That is because in the models inputs have variable size and MKLDNN operators with different input size are being added to the cache over and over.

The cache grows less and less over time, however, setting limits to the cache size could also be helpful. Can you confirm that the PR https://github.com/PaddlePaddle/Paddle/pull/25524 helps here?

luotao1 commented 4 years ago

I know the reason:

25524 only works on predictor.run(inputs, outputs, batch_size) before, since it needs to know the input's size. AnalysisPredictor::MkldnnPreSet() is used only in AnalysisPredictor::Run(inputs, outputs, batch_size).
Thus, it doesn't work on predictor.zero_copy_run()

@cryoco

Could you change to predictor.run(inputs, outputs, batch_size)?
Or you need predictor.zero_copy_run() support limits to the cache size as well? In this scenario, maybe we need to wrapper some new interface.

PaddlePaddle / Paddle

Memory leaks when doing CPU inference with MKLDNN #25506

25524 only works on `predictor.run(inputs, outputs, batch_size)` before, since it needs to know the input's size. `AnalysisPredictor::MkldnnPreSet()` is used only in `AnalysisPredictor::Run(inputs, outputs, batch_size)`.

PaddlePaddle / Paddle

Memory leaks when doing CPU inference with MKLDNN #25506

25524 only works on predictor.run(inputs, outputs, batch_size) before, since it needs to know the input's size. AnalysisPredictor::MkldnnPreSet() is used only in AnalysisPredictor::Run(inputs, outputs, batch_size).

25524 only works on `predictor.run(inputs, outputs, batch_size)` before, since it needs to know the input's size. `AnalysisPredictor::MkldnnPreSet()` is used only in `AnalysisPredictor::Run(inputs, outputs, batch_size)`.