Closed snsie closed 1 year ago
i think you may want to get a larger sized gpu? the gpu usage can change when you train, because each image comes with a different size.
@endernewton i got this this error with the same GPU but some time ago everything worked great. And i talking about detection on unlabeled COCO images not about training. I dont know the reason of this issue...
i have gpu memory error too! and i have no idea for a long time ,my gpu is 1060 , I have run the test script successfully too , and I have tried lowering the batch size, but that didn't fix the error too...
@endernewton @ScottSiegel
Change the file in experiments/res101.yml BATCH_SIZE: and RPN_BATCHSIZE:
Change the file in experiments/res101.yml BATCH_SIZE: and RPN_BATCHSIZE:
Sir/madam, is it work really ?
I have run the test script successfully, but I am hitting memory errors when training. The log is pasted below. I have tried lowering the batch size, but that didn't fix the error. I am using a 1070GTX GPU and I have run smallcorgi's faster-rcnn repository in the past and didn't have memory issues. Has anyone else encountered this error?
voc_2007_trainval
for training Set proposal method: gt Appending horizontally-flipped training examples... voc_2007_trainval gt roidb loaded from /home/scott/chridemo/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl done Preparing training data... done 10022 roidb entries Output will be saved to/home/scott/chridemo/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default
TensorFlow summaries will be saved to/home/scott/chridemo/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default
Loaded datasetvoc_2007_test
for training Set proposal method: gt Preparing training data... voc_2007_test gt roidb loaded from /home/scott/chridemo/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl done 4952 validation roidb entries Filtered 0 roidb entries: 10022 -> 10022 Filtered 0 roidb entries: 4952 -> 4952 2018-01-17 13:33:18.699896: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-01-17 13:33:18.877826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:03:00.0 totalMemory: 7.92GiB freeMemory: 7.52GiB 2018-01-17 13:33:18.877854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:03:00.0, compute capability: 6.1) Solving... /home/scott/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Loading initial model weights from data/imagenet_weights/vgg16.ckpt Variables restored: vgg_16/conv1/conv1_1/biases:0 Variables restored: vgg_16/conv1/conv1_2/weights:0 Variables restored: vgg_16/conv1/conv1_2/biases:0 Variables restored: vgg_16/conv2/conv2_1/weights:0 Variables restored: vgg_16/conv2/conv2_1/biases:0 Variables restored: vgg_16/conv2/conv2_2/weights:0 Variables restored: vgg_16/conv2/conv2_2/biases:0 Variables restored: vgg_16/conv3/conv3_1/weights:0 Variables restored: vgg_16/conv3/conv3_1/biases:0 Variables restored: vgg_16/conv3/conv3_2/weights:0 Variables restored: vgg_16/conv3/conv3_2/biases:0 Variables restored: vgg_16/conv3/conv3_3/weights:0 Variables restored: vgg_16/conv3/conv3_3/biases:0 Variables restored: vgg_16/conv4/conv4_1/weights:0 Variables restored: vgg_16/conv4/conv4_1/biases:0 Variables restored: vgg_16/conv4/conv4_2/weights:0 Variables restored: vgg_16/conv4/conv4_2/biases:0 Variables restored: vgg_16/conv4/conv4_3/weights:0 Variables restored: vgg_16/conv4/conv4_3/biases:0 Variables restored: vgg_16/conv5/conv5_1/weights:0 Variables restored: vgg_16/conv5/conv5_1/biases:0 Variables restored: vgg_16/conv5/conv5_2/weights:0 Variables restored: vgg_16/conv5/conv5_2/biases:0 Variables restored: vgg_16/conv5/conv5_3/weights:0 Variables restored: vgg_16/conv5/conv5_3/biases:0 Variables restored: vgg_16/fc6/biases:0 Variables restored: vgg_16/fc7/biases:0 Loaded. Fix VGG16 layers.. Fixed. 2018-01-17 13:33:24.312766: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.09GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:29.614554: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.49GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:30.811321: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:34.048083: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.09GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:35.281011: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:35.424096: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.75GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:36.886421: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.49GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-01-17 13:33:37.259091: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.49GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. iter: 20 / 70000, total loss: 3.258544