coxlab / prednet

Code and models accompanying "Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning"
https://arxiv.org/abs/1605.08104
MIT License
759 stars 259 forks source link

ResourceExhaustedError #85

Open sayami888 opened 3 years ago

sayami888 commented 3 years ago

I tried to train with my own dataset, but I couldn't because of a memory error. I confirmed that I was able to learn normally by using kitti data.

The image size of the data I created is 1024x576. The GPU used is GeForce GTX 1080 Ti,and freememory is 9.92GB. The size of the training hkl file is 4.7GB and the size of the evaluation data is 1.4GB.

A warning message will be displayed as soon as learning begins. Epoch 1/150 2021-01-16 16:11:36.539265: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 486.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

The final error statement is: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4,96,288,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: pred_net_1/while/convolution_19 = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](pred_net_1/while/concat_5, pred_net_1/while/convolution_19/Enter)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: loss/mul/_577 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7402_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

The image size was larger at first, but I made it smaller many times. I also reduced the amount of images, but the error persists. Is the amount of my GPU internal memory insufficient?Do I have to reduce the image size?

If there is a solution, please let me know.