PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.26k stars 5.6k forks source link

Control GPU memory usage in unit tests #3437

Closed wangkuiyi closed 7 years ago

wangkuiyi commented 7 years ago

@emailweixu found that tests can fail for memory allocation if running them in parallel ctest -I 121,123 -j:

I found that tests can fail for memory allocation if running them in parallel: ctest -I 121,123 -j
121: Traceback (most recent call last):
121:   File "/home/wei/code/baidu/idl/dl/robot/external/paddle/python/paddle/v2/framework/tests/op_test_util.py", line 37, in test_all
121:     var.set(arr, place)
121: RuntimeError: Insufficient GPU memory to allocation. at [/home/wei/code/baidu/idl/dl/robot/external/paddle/paddle/framework/tensor.h:131]
121: Call Stacks:
121: /home/wei/code/baidu/idl/dl/robot/build/.env/local/lib/python2.7/site-packages/paddle/v2/framework/core.so(_ZN6paddle8platform13EnforceNotMetC1ENSt15__exception_ptr13exception_ptrEPKci+0x1d2) [0x7fde36115fc2]
121: /home/wei/code/baidu/idl/dl/robot/build/.env/local/lib/python2.7/site-packages/paddle/v2/framework/core.so(_ZN6paddle9framework6Tensor15PlaceholderImplIfNS_8platform8GPUPlaceEEC1ES4_m+0x130) [0x7fde36136824]
121: /home/wei/code/baidu/idl/dl/robot/build/.env/local/lib/python2.7/site-packages/paddle/v2/framework/core.so(_ZN6paddle9framework6Tensor12mutable_dataIfEEPT_N5boost7variantINS_8platform8GPUPlaceEJNS7_8CPUPlaceEEEE+0x1b1) [0x7fde3612bbf9]
....
1/3 Test #123: test_softmax_op ..................   Passed    1.31 sec
2/3 Test #122: test_sigmoid_op ..................***Failed    1.40 sec
3/3 Test #121: test_add_two_op ..................***Failed    1.42 sec
wangkuiyi commented 7 years ago

Each unit test shouldn't consume too much GPU memory.

wangkuiyi commented 7 years ago

I checked test_sigmoid.py

class TestSigmoidOp(unittest.TestCase):
    __metaclass__ = OpTestMeta

    def setUp(self):
        self.type = "sigmoid"
        self.inputs = {'X': np.random.random((32, 100)).astype("float32")}
        self.outputs = {'Y': 1 / (1 + np.exp(-self.inputs['X']))}

It seems that np.random.random((32,100)) generates a 32x100-matrix. Why do we need this big matrix? What is the difference than using a smaller, say, 3x2 matrix?

wangkuiyi commented 7 years ago

Similarly, in test_add_two_op.py, we have

            'X': numpy.random.random((102, 105)).astype("float32"),
            'Y': numpy.random.random((102, 105)).astype("float32")

Why big matrices of 102x105? What's the difference if we change to 2x5?

jacquesqiao commented 7 years ago
QiJune commented 7 years ago

Got it! I will reduce the tensor size in framework/op test @wangkuiyi

emailweixu commented 7 years ago

matrix sizes of 102105, 32100 are really not that big. There must be some other cause.

wangkuiyi commented 7 years ago

@emailweixu Created an issue https://github.com/PaddlePaddle/Paddle/issues/3482

gangliao commented 7 years ago

Thanks, I will check it. @wangkuiyi @emailweixu

gangliao commented 7 years ago

Currently, our new memory management is occupied exclusively by one process.

https://github.com/PaddlePaddle/Paddle/blob/9eaef75397926819294edda04dbed34aa069f5f4/paddle/platform/gpu_info.cc#L19

When we execute unit tests in parallel, one of the processes will occupy 95% GPU memory, when another process allocates GPU memory, if the former process has yet released, then others might get a nullptr from paddle::memory::Alloc. That's the reason why parallel unit test jobs frequently failed.

There are two optional stratagems to suppress this issue:

  1. optional 1. set fraction_of_gpu_memory_to_use in cc_test, nv_test of generic.cmake, but how to make python receive gflags in py_test.

  2. optional 2. Add system environment usingstd::getenv so that we can export FRACTION_GPU_MEMORY_TO_USE=0.2

@wangkuiyi @emailweixu