Control GPU memory usage in unit tests

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

http://www.paddlepaddle.org/

Apache License 2.0

22.26k stars 5.6k forks source link

Control GPU memory usage in unit tests #3437

Closed wangkuiyi closed 7 years ago

wangkuiyi commented 7 years ago

@emailweixu found that tests can fail for memory allocation if running them in parallel ctest -I 121,123 -j:

I found that tests can fail for memory allocation if running them in parallel: ctest -I 121,123 -j
121: Traceback (most recent call last):
121:   File "/home/wei/code/baidu/idl/dl/robot/external/paddle/python/paddle/v2/framework/tests/op_test_util.py", line 37, in test_all
121:     var.set(arr, place)
121: RuntimeError: Insufficient GPU memory to allocation. at [/home/wei/code/baidu/idl/dl/robot/external/paddle/paddle/framework/tensor.h:131]
121: Call Stacks:
121: /home/wei/code/baidu/idl/dl/robot/build/.env/local/lib/python2.7/site-packages/paddle/v2/framework/core.so(_ZN6paddle8platform13EnforceNotMetC1ENSt15__exception_ptr13exception_ptrEPKci+0x1d2) [0x7fde36115fc2]
121: /home/wei/code/baidu/idl/dl/robot/build/.env/local/lib/python2.7/site-packages/paddle/v2/framework/core.so(_ZN6paddle9framework6Tensor15PlaceholderImplIfNS_8platform8GPUPlaceEEC1ES4_m+0x130) [0x7fde36136824]
121: /home/wei/code/baidu/idl/dl/robot/build/.env/local/lib/python2.7/site-packages/paddle/v2/framework/core.so(_ZN6paddle9framework6Tensor12mutable_dataIfEEPT_N5boost7variantINS_8platform8GPUPlaceEJNS7_8CPUPlaceEEEE+0x1b1) [0x7fde3612bbf9]
....
1/3 Test #123: test_softmax_op ..................   Passed    1.31 sec
2/3 Test #122: test_sigmoid_op ..................***Failed    1.40 sec
3/3 Test #121: test_add_two_op ..................***Failed    1.42 sec

wangkuiyi commented 7 years ago

Each unit test shouldn't consume too much GPU memory.

wangkuiyi commented 7 years ago

I checked test_sigmoid.py

class TestSigmoidOp(unittest.TestCase):
    __metaclass__ = OpTestMeta

    def setUp(self):
        self.type = "sigmoid"
        self.inputs = {'X': np.random.random((32, 100)).astype("float32")}
        self.outputs = {'Y': 1 / (1 + np.exp(-self.inputs['X']))}

It seems that np.random.random((32,100)) generates a 32x100-matrix. Why do we need this big matrix? What is the difference than using a smaller, say, 3x2 matrix?

wangkuiyi commented 7 years ago

Similarly, in test_add_two_op.py, we have

            'X': numpy.random.random((102, 105)).astype("float32"),
            'Y': numpy.random.random((102, 105)).astype("float32")

Why big matrices of 102x105? What's the difference if we change to 2x5?

jacquesqiao commented 7 years ago

We should not create test data that will use too much memory.
I think a matrix of shape (32, 100) should not cause OOM. Maybe there are some other problems.
If the random data is bigger, it will cover more data cases, it has obvious influence in gradient test, because many elements in gradient matrix may be zero.

QiJune commented 7 years ago

Got it! I will reduce the tensor size in framework/op test @wangkuiyi

emailweixu commented 7 years ago

matrix sizes of 102105, 32100 are really not that big. There must be some other cause.

wangkuiyi commented 7 years ago

@emailweixu Created an issue https://github.com/PaddlePaddle/Paddle/issues/3482

gangliao commented 7 years ago

Thanks, I will check it. @wangkuiyi @emailweixu

gangliao commented 7 years ago

Currently, our new memory management is occupied exclusively by one process.

https://github.com/PaddlePaddle/Paddle/blob/9eaef75397926819294edda04dbed34aa069f5f4/paddle/platform/gpu_info.cc#L19

When we execute unit tests in parallel, one of the processes will occupy 95% GPU memory, when another process allocates GPU memory, if the former process has yet released, then others might get a nullptr from paddle::memory::Alloc. That's the reason why parallel unit test jobs frequently failed.

There are two optional stratagems to suppress this issue:

optional 1. set fraction_of_gpu_memory_to_use in cc_test, nv_test of generic.cmake, but how to make python receive gflags in py_test.
optional 2. Add system environment usingstd::getenv so that we can export FRACTION_GPU_MEMORY_TO_USE=0.2

@wangkuiyi @emailweixu