eriklindernoren / PyTorch-YOLOv3

Minimal PyTorch implementation of YOLOv3
GNU General Public License v3.0
7.31k stars 2.63k forks source link

trian error RuntimeError: CUDA error: out of memory #58

Closed Zwenbo closed 5 years ago

Zwenbo commented 6 years ago

qianle_wb@qianle:~$ cd PyTorch-YOLOv3/ qianle_wb@qianle:~/PyTorch-YOLOv3$ python3 train.py /usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, kwds) /usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, *kwds) Namespace(batch_size=16, checkpoint_dir='checkpoints', checkpoint_interval=1, class_path='data/coco.names', conf_thres=0.8, data_config_path='config/coco.data', epochs=30, image_folder='data/samples', img_size=416, model_config_path='config/yolov3.cfg', n_cpu=0, nms_thres=0.4, use_cuda=True, weights_path='weights/yolov3.weights') /usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images. warn("Anti-aliasing will be enabled by default in skimage 0.15 to " Traceback (most recent call last): File "train.py", line 83, in loss = model(imgs, targets) File "/home/qianle_wb/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/qianle_wb/PyTorch-YOLOv3/models.py", line 196, in forward x = module(x) File "/home/qianle_wb/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/qianle_wb/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/qianle_wb/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, **kwargs) File "/home/qianle_wb/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: out of memory qianle_wb@qianle:~/PyTorch-YOLOv3$

saranshkarira commented 5 years ago

Decrease batch size

wukongeek commented 5 years ago

@saranshkarira @Zwenbo For training, what is the batch-size that one GPU with 8g memory could take?

saranshkarira commented 5 years ago

set the multiscale to highest scale and perform a binary search from 0 to what size you are taking now, use the max where it doesn't crash

buttercutter commented 4 years ago

@saranshkarira What do you exactly mean by set the multiscale to highest scale ?

saranshkarira commented 4 years ago

@promach forget get, try decreasing batch size first. Try in this order, keep the one that doesn't crash 256->128->96->64->48->32.

buttercutter commented 4 years ago

@saranshkarira

I had already tried decreasing value of _batchsize as well as increasing the value of subdivisions , your suggestion above does not work for me.

See https://github.com/promach/PyTorch-YOLOv3/blob/addernet/YOLOv3_%2B_AdderNet.ipynb as well as the backpropagation equation for AdderNet