D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.
MIT License
1.56k stars 281 forks source link

在单个10G大小的1080TiGPU上运行时内存溢出的问题 #3

Closed duoduoda closed 5 years ago

duoduoda commented 5 years ago

输出: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=2 : out of memory Traceback (most recent call last): File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_base.py", line 89, in <module> main() File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_base.py", line 84, in main main_procedure(config, args.dataset, args.data_path, args, genotype, args.init_channels, args.layers, None, log) File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_utils.py", line 96, in main_procedure train_acc1, train_acc5, train_los = _train(train_loader, model, criterion, optimizer, 'train', epoch, config, args.print_freq, log) File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_utils.py", line 129, in _train for i, (inputs, targets) in enumerate(xloader): File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__ return self._process_next_batch(batch) File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in _pin_memory_loop batch = pin_memory_batch(batch) File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 245, in pin_memory_batch return [pin_memory_batch(sample) for sample in batch] File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 245, in <listcomp> return [pin_memory_batch(sample) for sample in batch] File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 239, in pin_memory_batch return batch.pin_memory() RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:265

D-X-Y commented 5 years ago

For GPU memory issue, you can try to reduce the batch size.