Closed Ilbotre closed 4 years ago
what you observed (including full logs):
please include full logs.
Please try decreasing your batch_size.
what you observed (including full logs):
please include full logs.
ERROR [06/09 11:05:25 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/lab/nbogliol/miniconda2/envs/detectron2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/lab/nbogliol/miniconda2/envs/detectron2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 228, in run_step
losses.backward()
File "/home/lab/nbogliol/miniconda2/envs/detectron2/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/lab/nbogliol/miniconda2/envs/detectron2/lib/python3.7/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.71 GiB (GPU 0; 31.72 GiB total capacity; 23.58 GiB already allocated; 1.58 GiB free; 29.08 GiB reserved in total by PyTorch)
[06/09 11:05:25 d2.engine.hooks]: Overall training speed: 22 iterations in 0:01:02 (2.8448 s / it)
[06/09 11:05:25 d2.engine.hooks]: Total training time: 0:01:02 (0:00:00 on hooks)
Traceback (most recent call last):
File "transfer_learning_kaist.py", line 418, in
Here you can find the full logs. Thank you in advance
It seems there should be other logs before the error, but it looks like it's just because the batch size 30 is too large.
Instructions To Reproduce the Issue:
what code you wrote or what changes you made (
git diff
)RuntimeError: CUDA out of memory. Tried to allocate 400.00 MiB (GPU 0; 31.72 GiB total capacity; 29.81 GiB already allocated; 27.94 MiB free; 30.64 GiB reserved in total by PyTorch)
wget -nc -q https://github.com/facebookresearch/detectron2/raw/master/detectron2/utils/collect_env.py && python collect_env.py