Closed AnhPC03 closed 3 years ago
Got the same problem. I have a server with 2080ti and 2070 super. Training does not work on multiple gpus. Using only 1 gpu (either from these 2) gives the same error as stated by @nguyentienanh2303 .
i have the same problem.
Traceback (most recent call last):
File "train.py", line 105, in
I fand the problem is tensorflow have large cuda memory we can
I fand the problem is tensorflow have large cuda memory we can
logger = Logger("logs")
logger.list_of_scalars_summary(tensorboard_log, batches_done)
logger.list_of_scalars_summary(evaluation_metrics, epoch)
I tried this and it still didn't work for me. This came out:
Traceback (most recent call last):
File "train-kaist_all.py", line 161, in <module>
loss, outputs = model(imgs, targets)
File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/dlsj/Documents/PyTorch-YOLOv3/models_mod.py", line 254, in forward
x = module(x)
File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward
exponential_average_factor, self.eps)
File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 7.79 GiB total capacity; 6.45 GiB already allocated; 26.75 MiB free; 191.87 MiB cached)
Does require_grad matter? Because when I set requires_grad to be false in the optimizer, it worked
我扇出的问题 是张量流具有大的cuda内存, 我们可以
logger = Logger(“ logs”)
#logger.list_of_scalars_summary(tensorboard_log,batches_done) #logger.list_of_scalars_summary(evaluation_metrics,epoch)
tensorboard_log,batches_done,evaluation_metrics' are not defined
I fand the problem is tensorflow have large cuda memory we can
logger = Logger("logs")
logger.list_of_scalars_summary(tensorboard_log, batches_done)
logger.list_of_scalars_summary(evaluation_metrics, epoch)
I tried this and it still didn't work for me. This came out:
Traceback (most recent call last): File "train-kaist_all.py", line 161, in <module> loss, outputs = model(imgs, targets) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/Documents/PyTorch-YOLOv3/models_mod.py", line 254, in forward x = module(x) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward exponential_average_factor, self.eps) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 7.79 GiB total capacity; 6.45 GiB already allocated; 26.75 MiB free; 191.87 MiB cached)
Does require_grad matter? Because when I set requires_grad to be false in the optimizer, it worked
you can tiry to change it
require_grad
?subdivision
?I fand the problem is tensorflow have large cuda memory we can
logger = Logger("logs")
logger.list_of_scalars_summary(tensorboard_log, batches_done)
logger.list_of_scalars_summary(evaluation_metrics, epoch)
It works. THX
I fand the problem is tensorflow have large cuda memory we can
logger = Logger("logs")
logger.list_of_scalars_summary(tensorboard_log, batches_done)
logger.list_of_scalars_summary(evaluation_metrics, epoch)
I tried this and it still didn't work for me. This came out:
Traceback (most recent call last): File "train-kaist_all.py", line 161, in <module> loss, outputs = model(imgs, targets) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/Documents/PyTorch-YOLOv3/models_mod.py", line 254, in forward x = module(x) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward exponential_average_factor, self.eps) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 7.79 GiB total capacity; 6.45 GiB already allocated; 26.75 MiB free; 191.87 MiB cached)
Does require_grad matter? Because when I set requires_grad to be false in the optimizer, it worked
Check dataloader. And set pin_memory to False.
Check dataloader. And set pin_memory to False.
@LinXiLuo
What do you exactly mean by pin_memory
?
What is the purpose of require_grad
?
Check dataloader. And set pin_memory to False.
@LinXiLuo
What do you exactly mean by
pin_memory
? What is the purpose ofrequire_grad
?
hi @promach any luck resolving this OOM issue?
How to set requires_grad to be false in the optimizer?The other methods above don't work for me.
I fand the problem is tensorflow have large cuda memory we can
logger = Logger("logs")
logger.list_of_scalars_summary(tensorboard_log, batches_done)
logger.list_of_scalars_summary(evaluation_metrics, epoch)
I tried this and it still didn't work for me. This came out:
Traceback (most recent call last): File "train-kaist_all.py", line 161, in <module> loss, outputs = model(imgs, targets) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/Documents/PyTorch-YOLOv3/models_mod.py", line 254, in forward x = module(x) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward exponential_average_factor, self.eps) File "/home/dlsj/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 7.79 GiB total capacity; 6.45 GiB already allocated; 26.75 MiB free; 191.87 MiB cached)
Does require_grad matter? Because when I set requires_grad to be false in the optimizer, it worked
you can tiry to change it
How to set requires_grad to be false in the optimizer?
I closed this issue due to inactivity. Feel free to reopen for further discussion.
I got an issue that CUDA out of memory although I've changed any batch size such as 1, 2, 4, 8, 16. Any one can help me, please?