apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

get stuck with subprocess in multithread #10948

Open HorsonLiu opened 6 years ago

HorsonLiu commented 6 years ago
import multiprocessing.dummy as multiprocess
import subprocess
import mxnet as mx

def handle_task():
    ctx = mx.cpu()

    for _ in xrange(500):
        print('g')
        # get stuck here
        mx.nd.ones((2, 3), ctx)
        print('i')
        subprocess.check_output(["ls"])
        print('d')

child_processes = [multiprocess.Process(target=handle_task, args=()) for _ in xrange(4)]

for i in xrange(4):
    child_processes[i].start()

for i in xrange(4):
    child_processes[i].join()
ankkhedia commented 6 years ago

@sandeep-krishnamurthy Could you please tag issue as python and bug

Roshrini commented 6 years ago

@HorsonLiu Can you provide more details on your setup?

Seems like similar issue(used to crash before) https://github.com/apache/incubator-mxnet/issues/9213 was solved by this PR before some time https://github.com/apache/incubator-mxnet/pull/8995

I was able to reproduce the issue on Linux, built MXNet from source. Tested on Python2 and Python3 both.

Observations: 1) If I comment "mx.nd.ones((2, 3), ctx)", example works fine. 2) If I comment "subprocess.check_output(["ls"])", example works fine. 3) On python2 this issue always occur. On python3, it happens sometimes randomly.

Roshrini commented 6 years ago

It looks like the issue here is that Python 2.7 always uses os.fork() in to implement multiprocessing.Process.start(). In Python3, there is a way to use other possible start methods which are 'fork', 'spawn' and 'forkserver'. You can set context as follows: multiprocessing.set_start_method('spawn') to avoid the issues over fork-safety.

Ran following code snippet multiple times in Python3 successfully.

import multiprocessing as multiprocess
import subprocess
import mxnet as mx

def handle_task():
    ctx = mx.cpu()
    for _ in range(500):
        print('g')
        mx.nd.ones((2, 3), ctx)
        print('i')
        subprocess.check_output(["ls"])
        print('d')

if __name__ == '__main__':
    multiprocess.set_start_method('spawn', force=True)
    child_processes = [multiprocess.Process(target=handle_task, args=()) for _ in range(4)]

    for i in range(4):
        child_processes[i].start()

    for i in range(4):
        child_processes[i].join()

@HorsonLiu Can you try and confirm if this fixes your issue?

HorsonLiu commented 6 years ago

@Roshrini Thank you for your suggestion. I am using python-2.7.5 mxnet-1.3.0 on centos-7. I have tried you code, it results into AttributeError: 'module' object has no attribute 'set_start_method'. And I've noticed that you are using multiprocessing, not multithread (multiprocessing.dummy).

Roshrini commented 6 years ago

@HorsonLiu MXNet is not very thread-safe and so this is not supported yet. Adding it as a FeatureRequest. @sandeep-krishnamurthy Can you please tag this as [FeatureRequest, Backend]

apeforest commented 6 years ago

@sandeep-krishnamurthy Please add label [Thread Safety] thanks!