interrupt multi-GPU training

Tramac / awesome-semantic-segmentation-pytorch

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)

Apache License 2.0

2.82k stars 581 forks source link

interrupt multi-GPU training #80

Open HSMung opened 4 years ago

HSMung commented 4 years ago

If I interrupt multi-GPU training, sometimes there will be several zombie processes. How can I avoid this situation?

pyradd commented 4 years ago

Which model and backbone are you using for multi-gpu training?

HSMung commented 4 years ago

Which model and backbone are you using for multi-gpu training?

Any one

pyradd commented 4 years ago

For the time being, I dont have a work around. However, most often the first process is the main process. If you kill that one, other zombie process seems to die.