MarvinTeichmann / KittiSeg

A Kitti Road Segmentation model implemented in tensorflow.
MIT License
911 stars 403 forks source link

Error in Running train.py #165

Open GyYLiu opened 6 years ago

GyYLiu commented 6 years ago

When I run the train.py about 3 hours later, I got the error information:

2018-04-14 02:55:47.490099: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: /home/amax/lgy/KittiSeg/RUNS/KittiSeg_2018_04_14_01.07/model.ckpt-2000.data-00000-of-00001.tempstate13881274976169662731 2018-04-14 02:55:47.491516: W tensorflow/core/kernels/queue_base.cc:294] _0_Queues/fifo_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "train.py", line 131, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "train.py", line 127, in main train.do_training(hypes) File "incl/tensorvision/train.py", line 396, in do_training run_training(hypes, modules, tv_graph, tv_sess) File "incl/tensorvision/train.py", line 324, in run_training tv_sess['saver'].save(sess, checkpoint_path, global_step=step) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1472, in save {self.saver_def.filename_tensor_name: checkpoint_file}) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: /home/amax/lgy/KittiSeg/RUNS/KittiSeg_2018_04_14_01.07/model.ckpt-2000.data-00000-of-00001.tempstate13881274976169662731 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Optimizer/Variable, Optimizer/training/beta1_power/_563, Optimizer/training/beta2_power/_565, conv1_1/biases/_567, conv1_1/biases/Adam/_569, conv1_1/biases/Adam_1/_571, conv1_1/filter/_573, conv1_1/filter/Adam/_575, conv1_1/filter/Adam_1/_577, conv1_2/biases/_579, conv1_2/biases/Adam/_581, conv1_2/biases/Adam_1/_583, conv1_2/filter/_585, conv1_2/filter/Adam/_587, conv1_2/filter/Adam_1/_589, conv2_1/biases/_591, conv2_1/biases/Adam/_593, conv2_1/biases/Adam_1/_595, conv2_1/filter/_597, conv2_1/filter/Adam/_599, conv2_1/filter/Adam_1/_601, conv2_2/biases/_603, conv2_2/biases/Adam/_605, conv2_2/biases/Adam_1/_607, conv2_2/filter/_609, conv2_2/filter/Adam/_611, conv2_2/filter/Adam_1/_613, conv3_1/biases/_615, conv3_1/biases/Adam/_617, conv3_1/biases/Adam_1/_619, conv3_1/filter/_621, conv3_1/filter/Adam/_623, conv3_1/filter/Adam_1/_625, conv3_2/biases/_627, conv3_2/biases/Adam/_629, conv3_2/biases/Adam_1/_631, conv3_2/filter/_633, conv3_2/filter/Adam/_635, conv3_2/filter/Adam_1/_637, conv3_3/biases/_639, conv3_3/biases/Adam/_641, conv3_3/biases/Adam_1/_643, conv3_3/filter/_645, conv3_3/filter/Adam/_647, conv3_3/filter/Adam_1/_649, conv4_1/biases/_651, conv4_1/biases/Adam/_653, conv4_1/biases/Adam_1/_655, conv4_1/filter/_657, conv4_1/filter/Adam/_659, conv4_1/filter/Adam_1/_661, conv4_2/biases/_663, conv4_2/biases/Adam/_665, conv4_2/biases/Adam_1/_667, conv4_2/filter/_669, conv4_2/filter/Adam/_671, conv4_2/filter/Adam_1/_673, conv4_3/biases/_675, conv4_3/biases/Adam/_677, conv4_3/biases/Adam_1/_679, conv4_3/filter/_681, conv4_3/filter/Adam/_683, conv4_3/filter/Adam_1/_685, conv5_1/biases/_687, conv5_1/biases/Adam/_689, conv5_1/biases/Adam_1/_691, conv5_1/filter/_693, conv5_1/filter/Adam/_695, conv5_1/filter/Adam_1/_697, conv5_2/biases/_699, conv5_2/biases/Adam/_701, conv5_2/biases/Adam_1/_703, conv5_2/filter/_705, conv5_2/filter/Adam/_707, conv5_2/filter/Adam_1/_709, conv5_3/biases/_711, conv5_3/biases/Adam/_713, conv5_3/biases/Adam_1/_715, conv5_3/filter/_717, conv5_3/filter/Adam/_719, conv5_3/filter/Adam_1/_721, fc6/biases/_723, fc6/biases/Adam/_725, fc6/biases/Adam_1/_727, fc6/weights/_729, fc6/weights/Adam/_731, fc6/weights/Adam_1/_733, fc7/biases/_735, fc7/biases/Adam/_737, fc7/biases/Adam_1/_739, fc7/weights/_741, fc7/weights/Adam/_743, fc7/weights/Adam_1/_745, score_fr/biases/_747, score_fr/biases/Adam/_749, score_fr/biases/Adam_1/_751, score_fr/weights/_753, score_fr/weights/Adam/_755, score_fr/weights/Adam_1/_757, score_pool3/biases/_759, score_pool3/biases/Adam/_761, score_pool3/biases/Adam_1/_763, score_pool3/weights/_765, score_pool3/weights/Adam/_767, score_pool3/weights/Adam_1/_769, score_pool4/biases/_771, score_pool4/biases/Adam/_773, score_pool4/biases/Adam_1/_775, score_pool4/weights/_777, score_pool4/weights/Adam/_779, score_pool4/weights/Adam_1/_781, upscore2/up_filter/_783, upscore2/up_filter/Adam/_785, upscore2/up_filter/Adam_1/_787, upscore32/up_filter/_789, upscore32/up_filter/Adam/_791, upscore32/up_filter/Adam_1/_793, upscore4/up_filter/_795, upscore4/up_filter/Adam/_797, upscore4/up_filter/Adam_1/_799)]]

Caused by op u'save/SaveV2', defined at: File "train.py", line 131, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "train.py", line 127, in main train.do_training(hypes) File "incl/tensorvision/train.py", line 380, in do_training tv_sess = core.start_tv_session(hypes) File "incl/tensorvision/core.py", line 172, in start_tv_session saver = tf.train.Saver(max_to_keep=int(utils.cfg.max_to_keep)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1139, in init self.build() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1170, in build restore_sequentially=self._restore_sequentially) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 689, in build save_tensor = self._AddSaveOps(filename_tensor, saveables) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 276, in _AddSaveOps save = self.save_op(filename_tensor, saveables) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 219, in save_op tensors) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 745, in save_v2 tensors=tensors, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): /home/amax/lgy/KittiSeg/RUNS/KittiSeg_2018_04_14_01.07/model.ckpt-2000.data-00000-of-00001.tempstate13881274976169662731 [[Node: save/SaveV2 = SaveV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Optimizer/Variable, Optimizer/training/beta1_power/_563, Optimizer/training/beta2_power/_565, conv1_1/biases/_567, conv1_1/biases/Adam/_569, conv1_1/biases/Adam_1/_571, conv1_1/filter/_573, conv1_1/filter/Adam/_575, conv1_1/filter/Adam_1/_577, conv1_2/biases/_579, conv1_2/biases/Adam/_581, conv1_2/biases/Adam_1/_583, conv1_2/filter/_585, conv1_2/filter/Adam/_587, conv1_2/filter/Adam_1/_589, conv2_1/biases/_591, conv2_1/biases/Adam/_593, conv2_1/biases/Adam_1/_595, conv2_1/filter/_597, conv2_1/filter/Adam/_599, conv2_1/filter/Adam_1/_601, conv2_2/biases/_603, conv2_2/biases/Adam/_605, conv2_2/biases/Adam_1/_607, conv2_2/filter/_609, conv2_2/filter/Adam/_611, conv2_2/filter/Adam_1/_613, conv3_1/biases/_615, conv3_1/biases/Adam/_617, conv3_1/biases/Adam_1/_619, conv3_1/filter/_621, conv3_1/filter/Adam/_623, conv3_1/filter/Adam_1/_625, conv3_2/biases/_627, conv3_2/biases/Adam/_629, conv3_2/biases/Adam_1/_631, conv3_2/filter/_633, conv3_2/filter/Adam/_635, conv3_2/filter/Adam_1/_637, conv3_3/biases/_639, conv3_3/biases/Adam/_641, conv3_3/biases/Adam_1/_643, conv3_3/filter/_645, conv3_3/filter/Adam/_647, conv3_3/filter/Adam_1/_649, conv4_1/biases/_651, conv4_1/biases/Adam/_653, conv4_1/biases/Adam_1/_655, conv4_1/filter/_657, conv4_1/filter/Adam/_659, conv4_1/filter/Adam_1/_661, conv4_2/biases/_663, conv4_2/biases/Adam/_665, conv4_2/biases/Adam_1/_667, conv4_2/filter/_669, conv4_2/filter/Adam/_671, conv4_2/filter/Adam_1/_673, conv4_3/biases/_675, conv4_3/biases/Adam/_677, conv4_3/biases/Adam_1/_679, conv4_3/filter/_681, conv4_3/filter/Adam/_683, conv4_3/filter/Adam_1/_685, conv5_1/biases/_687, conv5_1/biases/Adam/_689, conv5_1/biases/Adam_1/_691, conv5_1/filter/_693, conv5_1/filter/Adam/_695, conv5_1/filter/Adam_1/_697, conv5_2/biases/_699, conv5_2/biases/Adam/_701, conv5_2/biases/Adam_1/_703, conv5_2/filter/_705, conv5_2/filter/Adam/_707, conv5_2/filter/Adam_1/_709, conv5_3/biases/_711, conv5_3/biases/Adam/_713, conv5_3/biases/Adam_1/_715, conv5_3/filter/_717, conv5_3/filter/Adam/_719, conv5_3/filter/Adam_1/_721, fc6/biases/_723, fc6/biases/Adam/_725, fc6/biases/Adam_1/_727, fc6/weights/_729, fc6/weights/Adam/_731, fc6/weights/Adam_1/_733, fc7/biases/_735, fc7/biases/Adam/_737, fc7/biases/Adam_1/_739, fc7/weights/_741, fc7/weights/Adam/_743, fc7/weights/Adam_1/_745, score_fr/biases/_747, score_fr/biases/Adam/_749, score_fr/biases/Adam_1/_751, score_fr/weights/_753, score_fr/weights/Adam/_755, score_fr/weights/Adam_1/_757, score_pool3/biases/_759, score_pool3/biases/Adam/_761, score_pool3/biases/Adam_1/_763, score_pool3/weights/_765, score_pool3/weights/Adam/_767, score_pool3/weights/Adam_1/_769, score_pool4/biases/_771, score_pool4/biases/Adam/_773, score_pool4/biases/Adam_1/_775, score_pool4/weights/_777, score_pool4/weights/Adam/_779, score_pool4/weights/Adam_1/_781, upscore2/up_filter/_783, upscore2/up_filter/Adam/_785, upscore2/up_filter/Adam_1/_787, upscore32/up_filter/_789, upscore32/up_filter/Adam/_791, upscore32/up_filter/Adam_1/_793, upscore4/up_filter/_795, upscore4/up_filter/Adam/_797, upscore4/up_filter/Adam_1/_799)]]

I follow the step of readme and it work well at first and i'm not sure if this error is memory error

Leo551 commented 6 years ago

Have you solved this problem? I also met the similar problem of OOM...