Open Hughen opened 5 years ago
Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Bug
@Hughen Thank you for sharing the issue, requesting to provide more details and train.py so that we can look into reproducing it @mxnet-label-bot add [Pending requester info]
Description
After interrupting
train.py
, there are many zombie processes that can not be killed. It seems that the gpu tasks is not being recycled properly.Environment info (Required)
Error Message:
And dmesg log has a stack error like this has occurred: