Describe the bug
when train it with arg --distributed-no-spawn , the program gets stuck.
and without arg --distributed-no-spawn, the program produces strange "OUT OF MEMORY" error, since there's free GPU space.
To Reproduce
append line '--distributed-no-spawn' in train_wineholder.sh like the followed
Describe the bug when train it with arg
--distributed-no-spawn
, the program gets stuck. and without arg--distributed-no-spawn
, the program produces strange"OUT OF MEMORY" error
, since there's free GPU space. To Reproduce append line '--distributed-no-spawn' in train_wineholder.sh like the followedand when running it , the program get stuck.
Without
--distributed-no-spawn
, it will log like thisit's strange since it has no message like
Tried to allocate 2.0 GiB
. Moreover,nvidis-smi
shows theres free space in GPU 4 and 7.Desktop (please complete the following information):
THANKS!