Open quanvuong opened 4 years ago
adding del loss and torch.cuda.empty_cache() solve this problem
Actually, using empty_cached() leads to really slow GPU operations (60 hours for the fine tuning step). Is there another work around?
If I simply do del loss without emptying the cache, the out of memory error still happens.
torch.cuda.empty_cache()
hi @quanvuong would you mind elaborating where to add these, much appreciated!
you can add it after loss.backward()
Hi @quanvuong,
Had you solved this problem ?
I got one similar when evaluating the baseline model, which caused CUDA error: out of memory
due to accumulate the data from each iter.
I used torch 0.4.1 version. Already try to emty_cache()
both del metax, mask
but it doesn't help.
Hi @quanvuong,
Had you solved this problem ? I got one similar when evaluating the baseline model, which caused
CUDA error: out of memory
due to accumulate the data from each iter. I used torch 0.4.1 version. Already try toemty_cache()
bothdel metax, mask
but it doesn't help.
In my cases, I used torch v0.4.1 instead of v0.3.1 like the author used. I solved my problem by adding with torch.no_grad()
during validation because volatile
variable in Variable class no longer clear the gradient value, causing accumulated memory in GPU.
Based on my understanding, there are two reasons for the out-of-memory during tuning
The solution could be 1. decrease the batch size a little bit
I am reproducing the result using the instruction provided in the README file.
I was able to train the base model and obtain AP of 0.6862, which matches what the paper reports. However, when I tried to run the fine-tuning command, the process exits with an out of memory error for the backward pass.
I am training with 4 GeForce GTX 1080 Ti with roughly 12Gb of memory. Did you use GPUs with more memory or is something weird happening?