When I began to train your model, I had this issue, but I have no ideas how to fix it. Could you please help me fix it? Thank you very much
Train loss: 0.0 Train iou: 0.0
Val. loss: 0.6931054 Val. iou: 0.4069072
Starting epoch: 0
Traceback (most recent call last):
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[4,256,121,161] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node up2/conv2d_transpose}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[iou_metric/confusion_matrix/stack_1/_81]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[4,256,121,161] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node up2/conv2d_transpose}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "model.py", line 232, in <module>
train(image_paths, mask_paths, val_image_paths, val_mask_paths)
File "model.py", line 156, in train
[train, cost, iou_update, seg_image], feed_dict=train_feed_dict)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[4,256,121,161] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node up2/conv2d_transpose (defined at /tmp/tmpwtsgo0a_.py:68) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[iou_metric/confusion_matrix/stack_1/_81]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[4,256,121,161] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node up2/conv2d_transpose (defined at /tmp/tmpwtsgo0a_.py:68) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node up2/conv2d_transpose:
up2/stack (defined at /tmp/tmpwtsgo0a_.py:67)
batch_normalization_9/cond/Merge (defined at /tmp/tmp5xi7m83o.py:14)
up2/kernel/read (defined at model.py:33)
Input Source operations connected to node up2/conv2d_transpose:
up2/stack (defined at /tmp/tmpwtsgo0a_.py:67)
batch_normalization_9/cond/Merge (defined at /tmp/tmp5xi7m83o.py:14)
up2/kernel/read (defined at model.py:33)
Original stack trace for 'up2/conv2d_transpose':
File "model.py", line 232, in <module>
train(image_paths, mask_paths, val_image_paths, val_mask_paths)
File "model.py", line 114, in train
logits = inference(image_placeholder, training_flag)
File "model.py", line 79, in inference
up2 = trans_conv_with_bn(unconv3, 256, [3, 3], is_training, name='up2')
File "model.py", line 33, in trans_conv_with_bn
use_bias=use_bias, kernel_initializer=tf.contrib.layers.xavier_initializer(), name=name)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/layers/convolutional.py", line 1279, in conv2d_transpose
return layer.apply(inputs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.__call__(inputs, *args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "/tmp/tmpwtsgo0a_.py", line 68, in tf__call
outputs = ag__.converted_call('conv2d_transpose', backend, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (inputs, self.kernel, output_shape_tensor), {'strides': self.strides, 'padding': self.padding, 'data_format': self.data_format, 'dilation_rate': self.dilation_rate})
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 4582, in conv2d_transpose
data_format=tf_data_format)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2147, in conv2d_transpose
name=name)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2218, in conv2d_transpose_v2
name=name)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1407, in conv2d_backprop_input
name=name)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/user/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
It seems you dont have enough GPU memory to train the model. It is an OOM error (out of memory). Try using smaller dimensional images or reduce the batch size.
When I began to train your model, I had this issue, but I have no ideas how to fix it. Could you please help me fix it? Thank you very much