Open HSMung opened 4 years ago
Which model and backbone are you using for multi-gpu training?
Which model and backbone are you using for multi-gpu training?
Any one
For the time being, I dont have a work around. However, most often the first process is the main process. If you kill that one, other zombie process seems to die.
If I interrupt multi-GPU training, sometimes there will be several zombie processes. How can I avoid this situation?