Closed YoungJoongUNC closed 4 years ago
Is this memory issue? A smaller batchsize or more gpus would be helpful.
If I do nvidia-smi, then the memory is plenty. If I use batch size = gpu 1 , then It works fine. But If I use batch size = gpu 2, then It does not work and shows the above error. Have you also encountered this? Did you use 8 gpu with 64 batch size?
May I ask your GPU spec and how many GPUs and batch size did you use while training?
For the uploaded, I trained it with 8 GPUs and 64 batchsize.
Hello.
When I run the train.py with batch size 1, then it works fine. But when I use the batch size of 64 (the default value), it throws error like below. May I ask how could I fix this?