Closed WorstCodeWay closed 3 years ago
Augmentation affects image sizes, therefore affect memory usage. This is not an unexpected issue
@ppwwyyxx Yeah,you are right at this point. But, I input a empty list of augmentations, which means no augmentation at all, at least no more than the default augmentations, am I right?
Basically, I'm training my own dataset based on modification of
plain_train_net.py
. I tried two cases in order to add some augmentations to my dataset. CASE-1 is using defaultbuild_detection_train_loader
withcfg
as input, CASE-2 is with aDatasetMapper
as input which has a emtpy augmentations list. The former can train successfully, but the latter failed with the title error. Mainly codes as belowInstructions To Reproduce the 🐛 Bug:
[08/17 16:26:23 detectron2]: Command line arguments: Namespace(config_file='', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], predict_only=False, resume=False) [08/17 16:26:23 detectron2]: Running with full config: CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 10 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:
[08/17 16:26:23 detectron2]: Full config saved to ./output/config.yaml [08/17 16:26:23 d2.utils.env]: Using a generated random seed 23383141
[08/17 16:26:29 d2.data.build]: Using training sampler TrainingSampler [08/17 16:26:29 d2.data.common]: Serializing 106 elements to byte tensors and concatenating them all ... [08/17 16:26:29 d2.data.common]: Serialized dataset takes 0.24 MiB [08/17 16:26:29 detectron2]: Starting training from iteration 0 Traceback (most recent call last): File "tools/pandent_train.py", line 329, in
args=(args,),
File "/home/jason/Documents/vist/deeplearning/detectron2/detectron2/engine/launch.py", line 82, in launch
main_func(args)
File "tools/pandent_train.py", line 314, in main
do_train(cfg, model, resume=args.resume)
File "tools/pandent_train.py", line 233, in do_train
loss_dict = model(data)
File "/home/jason/Documents/vist/deeplearning/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/jason/Documents/vist/deeplearning/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 154, in forward
features = self.backbone(images.tensor)
File "/home/jason/Documents/vist/deeplearning/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/jason/Documents/vist/deeplearning/detectron2/detectron2/modeling/backbone/fpn.py", line 141, in forward
lateral_features = lateral_conv(features)
File "/home/jason/Documents/vist/deeplearning/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/jason/Documents/vist/deeplearning/detectron2/detectron2/layers/wrappers.py", line 85, in forward
x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups
RuntimeError: CUDA out of memory. Tried to allocate 616.00 MiB (GPU 0; 7.80 GiB total capacity; 5.53 GiB already allocated; 253.88 MiB free; 5.68 GiB reserved in total by PyTorch)
wget -nc -q https://github.com/facebookresearch/detectron2/raw/master/detectron2/utils/collect_env.py && python collect_env.py
sys.platform linux Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] numpy 1.19.5 detectron2 0.4.1 @/home/jason/Documents/vist/deeplearning/detectron2/detectron2 Compiler GCC 7.5 CUDA compiler CUDA 10.2 detectron2 arch flags 7.0 DETECTRON2_ENV_MODULE
PyTorch 1.8.1+cu102 @/home/jason/Documents/vist/deeplearning/venv/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0 GeForce RTX 2070 with Max-Q Design (arch=7.5)
CUDA_HOME /usr/local/cuda-10.2
Pillow 8.2.0
torchvision 0.9.1+cu102 @/home/jason/Documents/vist/deeplearning/venv/lib/python3.6/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.5.post20210624
iopath 0.1.8
cv2 4.5.2
PyTorch built with:
Complete Source Code
If your issue looks like an installation issue / environment issue, please first try to solve it yourself with the instructions in https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues