I use 8 40G or 64G graphics cards to train, batchsize is set to 1, and then oom will still appear during the training process.
I've seen that most time memory usage during training probably stays around 30G, but at some point it exceeds the memory capacity.
I use 8 40G or 64G graphics cards to train, batchsize is set to 1, and then oom will still appear during the training process. I've seen that most time memory usage during training probably stays around 30G, but at some point it exceeds the memory capacity.