Ning-Ding / Implementation-CVPR2015-CNN-for-ReID

Implementation for CVPR 2015 Paper: "An Improved Deep Learning Architecture for Person Re-Identification".
MIT License
147 stars 71 forks source link

Ran out of memory #21

Open GerrieWell opened 7 years ago

GerrieWell commented 7 years ago

W tensorflow/core/common_runtime/bfc_allocator.cc:274] **_*******************************xxx************************xx********************xxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 25.41MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[150,25,74,24]
Traceback (most recent call last):
  File "main.py", line 100, in <module>
    main(args.dataset_path)
  File "main.py", line 20, in main
    train(model, dataset_path)
  File "main.py", line 43, in train
    model.fit_generator(Data_Generator.flow(f,flag = flag_train),one_epoch,epoch_num,validation_data=Data_Generator.flow(f,train_or_validation=which_val_data,flag=flag_val),nb_val_samples=nb_val_samples)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/engine/training.py", line 1877, in fit_generator
    class_weight=class_weight)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/engine/training.py", line 1621, in train_on_batch
    outputs = self.train_function(ins)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2103, in __call__
    feed_dict=feed_dict)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[150,74,24,25]
     [[Node: conv2d_2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](max_pooling2d_1/MaxPool, conv2d_2/kernel/read)]]
     [[Node: add_9/_35 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_19807_add_9", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'conv2d_2/convolution', defined at:
  File "main.py", line 100, in <module>
    main(args.dataset_path)
  File "main.py", line 18, in main
    model = generate_model()
  File "/Volumes/more/source/cv/reid/Implementation-CVPR2015-CNN-for-ReID/CUHK03/model.py", line 61, in generate_model
    x1 = share_conv_2(x1)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/engine/topology.py", line 578, in __call__
    output = self.call(inputs, **kwargs)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/layers/convolutional.py", line 164, in call
    dilation_rate=self.dilation_rate)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2893, in conv2d
    data_format='NHWC')
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 639, in convolution
    op=op)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 308, in with_space_to_batch
    return op(input, num_spatial_dims, padding)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 631, in op
    name=name)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 129, in _non_atrous_convolution
    name=name)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
    data_format=data_format, name=name)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/gerrie/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[150,74,24,25]
     [[Node: conv2d_2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](max_pooling2d_1/MaxPool, conv2d_2/kernel/read)]]
     [[Node: add_9/_35 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_19807_add_9", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Seem like GPU memory run out . How to solve this problem?

my device is :

CUHK03 git:(master) ✗  $ cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GT 650M (CC 3.0): 745.94 of 1023.7 MB (i.e. 72.9%) Free

The memory should be enough since I've ran other big project using tensorflow. and I run onmacOS 10.12. latest tensorflow version .

GerrieWell commented 7 years ago

可以提供你训练好的模型吗?

LG17 commented 7 years ago

如果能提供 权值文件 就太好了!

prashanthbasani commented 7 years ago

I am also facing the same issue with error showing as tensorflow/core/framework/op_kernel.cc:1152] Resource exhausted: OOM when allocating tensor with shape[150,41,16,25]

prashanthbasani commented 7 years ago

Change the batch size argument to 50 in data_preparation.py init and flow functions.