PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.23k stars 5.58k forks source link

large memory used when infer #11185

Closed tensor-tang closed 6 years ago

tensor-tang commented 6 years ago

This is an issue of NLP online service.

When run inference, the memory usage is always kept as about 6G, which is definitely larger than actually needed.

image

ChinaLiuHao commented 6 years ago

I meet this situation too. In addition, when i use the inference by multi-thread way with "export OPENBLAS_NUM_THREADS=1", the program may end with the "Aborted" error!

tensor-tang commented 6 years ago

@ChinaLiuHao And as an addition, the "Abort" error is randomly encountered, not always appears.

luotao1 commented 6 years ago

The OCR CRNN_CTC service also has a large memory: image

tensor-tang commented 6 years ago

https://github.com/PaddlePaddle/Paddle/blob/666c94e3be10c2290eb143fdff208684e9ee34fe/paddle/fluid/memory/detail/buddy_allocator.cc#L188-L192

This should be the reason. Paddle would alloc max chunk size at the first time.

tensor-tang commented 6 years ago

After debugging we can found there is a flag to choose how much memory we would like to use at the first time. Default it would use about 3.2%(1/32) of your total memory.

usage:

your_app --fraction_of_cpu_memory_to_use=0.1 # it would use 3.2% * 0.1 of total

The track back should be like this:

https://github.com/PaddlePaddle/Paddle/blob/666c94e3be10c2290eb143fdff208684e9ee34fe/paddle/fluid/platform/cpu_info.cc#L26-L28

https://github.com/PaddlePaddle/Paddle/blob/666c94e3be10c2290eb143fdff208684e9ee34fe/paddle/fluid/platform/cpu_info.cc#L54-L58

https://github.com/PaddlePaddle/Paddle/blob/666c94e3be10c2290eb143fdff208684e9ee34fe/paddle/fluid/platform/cpu_info.cc#L65-L69

https://github.com/PaddlePaddle/Paddle/blob/666c94e3be10c2290eb143fdff208684e9ee34fe/paddle/fluid/memory/malloc.cc#L32-L36

tensor-tang commented 6 years ago

@ChinaLiuHao About the "Abort" issue, we can open another issue to discuss it. Thanks.