gpu memory allocate will be error when using multiprocessing.Process

tornadomeet commented 7 years ago

reproduce code

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    print("process id is {:s}".format(current_process().name))
    a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
    a.asnumpy()

if __name__ == '__main__':
    runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
    for p in runs:
      p.start()
    for p in runs:
      p.join()
    print("done!")

Os: linux, centos 7 + cuda7.5 + cuDNN 5.1

log:

[14:32:58] /home/work/wuwei/project/dmlc/mxnet/dmlc-core/include/dmlc/./logging.h:300: [14:32:58] src/storage/storage.cc:38: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: initialization error

Stack trace returned 40 entries:
[bt] (0) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f32b9501039]
[bt] (1) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl14ActivateDeviceENS_7ContextE+0x2a6) [0x7f32b9fb4de6]
[bt] (2) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x4a) [0x7f32b9fb263a]
[bt] (3) /home/work/wuwei/tools/mxnet/lib64/python2.7/site-packages/mxnet-0.9.1-py2.7-linux-x86_64.egg/mxnet/libmxnet.so(MXNDArrayCreateEx+0x595) [0x7f32b9fe6685]
[bt] (4) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f32c24f9dac]
[bt] (5) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f32c24f96d5]
[bt] (6) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f32c270cc8b]
[bt] (7) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f32c2706a85]
[bt] (8) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (9) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d4c) [0x7f32cda1025c]
[bt] (10) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (11) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (12) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (13) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (14) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (15) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x425f) [0x7f32cda1276f]
[bt] (16) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (17) /lib64/libpython2.7.so.1.0(+0x6f05d) [0x7f32cd9a105d]
[bt] (18) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (19) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0xde7) [0x7f32cda0f2f7]
[bt] (20) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (21) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (22) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (23) /lib64/libpython2.7.so.1.0(+0x6ef68) [0x7f32cd9a0f68]
[bt] (24) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (25) /lib64/libpython2.7.so.1.0(+0x590a5) [0x7f32cd98b0a5]
[bt] (26) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (27) /lib64/libpython2.7.so.1.0(+0xa1057) [0x7f32cd9d3057]
[bt] (28) /lib64/libpython2.7.so.1.0(+0x9fd6f) [0x7f32cd9d1d6f]
[bt] (29) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f32cd97c0b3]
[bt] (30) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d4c) [0x7f32cda1025c]
[bt] (31) /lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4350) [0x7f32cda12860]
[bt] (32) /lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7ed) [0x7f32cda140bd]
[bt] (33) /lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x32) [0x7f32cda141c2]
[bt] (34) /lib64/libpython2.7.so.1.0(+0xfb5ff) [0x7f32cda2d5ff]
[bt] (35) /lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0x7e) [0x7f32cda2e7be]
[bt] (36) /lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe9) [0x7f32cda2fa49]
[bt] (37) /lib64/libpython2.7.so.1.0(Py_Main+0xc9f) [0x7f32cda40b9f]
[bt] (38) /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f32ccc6cb15]
[bt] (39) python() [0x400721]

[14:32:58] /home/work/wuwei/project/dmlc/mxnet/dmlc-core/include/dmlc/./logging.h:300: [14:32:58] src/storage/storage.cc:38: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: initialization error
......

tornadomeet commented 7 years ago

v0.7.0 and v0.8.0 is ok, master will bring this error.

xlvector commented 7 years ago

I may meet similar problem:

[11:09:02] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/dmlc-core/include/dmlc/./logging.h:300: [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: initialization error

Stack trace returned 6 entries: [bt] (0) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f241b51d2b9] [bt] (1) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7f241bfeb078] [bt] (2) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7f241bfee840] [bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb6970) [0x7f24afe93970] [bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f24b400e0a4] [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24b342062d]

terminate called after throwing an instance of 'dmlc::Error' what(): [11:09:03] /data00/tiger/.jenkins/workspace/lab_mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: initialization error

Stack trace returned 6 entries: [bt] (0) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f241b51d2b9] [bt] (1) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7f241bfeb078] [bt] (2) /opt/tiger/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7f241bfee840] [bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb6970) [0x7f24afe93970] [bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f24b400e0a4] [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f24b342062d]

piiswrong commented 7 years ago

The only way to reliably use cuda with multiprocessing is to import mxnet in after creating subprocesses.

mcxkarakoc commented 7 years ago

.

leezu commented 7 years ago

I don't have any problems executing above code with a current version of mxnet. @piiswrong do you have any insight into why it is working now compared to earlier this year? @tornadomeet do you still experience this issue? Perhaps it is related to a different cuda version/system configuration.. https://github.com/dmlc/mxnet/pull/4695 seems to contain the fix.

In general I believe using python multiprocessing and specifying the forkserver start method before importing mxnet should be a workaround for any cuda related multiprocessing issues. In particular it should still allow creating new processes after mxnet was imported, as the processes are forked from the forkserver which has no cuda context. This also seems to be what pytorch is doing.

szha commented 7 years ago

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!

anxingle commented 6 years ago

SO , no solution to solve the problem？

anxingle commented 6 years ago

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    print("process id is {:s}".format(current_process().name))
    a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
    a.asnumpy()

if __name__ == '__main__':
    # worker_count = multiprocessing.cpu_count() -2
    worker_count = 8 
    runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
    for p in runs:
      p.start()
    for p in runs:
      p.join()
    print("done!")

It is magical！ I found it is OK when I set worker_count less than 8, while it doesn't work when worker_count more than 8!

zachgk commented 6 years ago

@mxnet-label-bot add [Python, Bug]

vrakesh commented 6 years ago

@szha Has this issue been resolved? I have not been able to reproduce the exact issue, It stalls to fail only when GPU runs out of memory, I have been able to spawn more than 10 workers with the example script. I see a related PR has been merged in dmlc/gluon-nlp repo

szha commented 6 years ago

@leezu might still have some issue with it so let's wait for his comment too.

leezu commented 6 years ago

Here is an updated test case

import numpy as np
import mxnet as mx
from multiprocessing import Process, current_process

def test():
    a = mx.random.seed(1)

if __name__ == '__main__':
    a = mx.nd.random_normal(shape=(10,10), ctx=mx.gpu(0))
    runs = [Process(target=test) for i in range(1)]
    for p in runs:
      p.start()
    for p in runs:
      p.join()

Here Cuda is initialized on the parent process before calling the child processes. You may argue, that GPU operations in the child processes should not be supported, but then the situation must be handled gracefully, ie. throw some error on the Python side and not the C++ side. But let's accept the current C++ exception. Even then, if we only want to do CPU work in the child process, above example will crash as the random.seed calls some Cuda related code internally. So there is currently no option to have deterministic execution of code in the child processes and code may crash at unexpected times (such as calling random.seed).

nkhdiscovery commented 5 years ago

@leezu Here is something even more complex which works, I thought anybody else may come here and needs the solution - which does not work if you do not force mp.set_start_method('forkserver', force=True)

import random
import numpy as np
import mxnet as mx
import multiprocessing as mp

def test():
    mx.random.seed(random.randint(10,200))
    a = mx.nd.random_normal(shape=(2,2), ctx=mx.gpu(0))
    print('child no. ', mp.current_process().name, ':' , a)

if __name__ == '__main__':
    mp.set_start_method('forkserver', force=True)
    ab = mx.nd.random_normal(shape=(2,2), ctx=mx.gpu(0))
    print('main proc.: ', ab)
    runs = [mp.Process(target=test) for i in range(3)]
    for p in runs:
      p.start()
    for p in runs:
      p.join()

    print('done')

Hope it helps.

mdv3101 commented 5 years ago

Still facing issues unable to work and now have to change the entire architecture of the application because of this

larroy commented 5 years ago

@mxnet-label-bot add [Backend]

larroy commented 5 years ago

Forking the library is not supported as of now.

larroy commented 5 years ago

I also can't reproduce this with the latest master

In [2]: import numpy as np
   ...: import mxnet as mx
   ...: from multiprocessing import Process, current_process
   ...: 
   ...: def test():
   ...:     print("process id is {:s}".format(current_process().name))
   ...:     a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
   ...:     a.asnumpy()
   ...: 
   ...: if __name__ == '__main__':
   ...:     runs = [Process(target=test) for i in range(2)]  # 1 or 2 or N process is the same error
   ...:     for p in runs:
   ...:       p.start()
   ...:     for p in runs:
   ...:       p.join()
   ...:     print("done!")
   ...: 
process id is Process-2
process id is Process-3
done!

In [1]: import numpy as np
   ...: import mxnet as mx
   ...: from multiprocessing import Process, current_process
   ...: 
   ...: def test():
   ...:     print("process id is {:s}".format(current_process().name))
   ...:     a = mx.nd.array(np.zeros((100, 100, 100, 100)), mx.gpu(0))
   ...:     a.asnumpy()
   ...: 
   ...: if __name__ == '__main__':
   ...:     runs = [Process(target=test) for i in range(1)]  # 1 or 2 or N process is the same error
   ...:     for p in runs:
   ...:       p.start()
   ...:     for p in runs:
   ...:       p.join()
   ...:     print("done!")
   ...: 

process id is Process-1

done!

leezu commented 4 years ago

@PascalIversen provided a new reproducer: https://github.com/apache/incubator-mxnet/issues/19291

apache / mxnet

gpu memory allocate will be error when using multiprocessing.Process #4659