MaybeShewill-CV / lanenet-lane-detection

Unofficial implemention of lanenet model for real time lane detection
Apache License 2.0
2.35k stars 885 forks source link

It keeps reporting using uninitialized variables before training started #375

Closed FCInter closed 4 years ago

FCInter commented 4 years ago

I launch the training using the command

python3 tools/train_lanenet.py --net vgg --dataset_dir data_path/tusimple/from_zips/training/ -m 0

Then I got the following error:

It says I have variables uninitialized. But I didn't make any change to the training code. I double check the code and there is sess.run(init) which shall initialize all variables.

What's wrong with it?


2020-04-20 12:18:36.205370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:35:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2020-04-20 12:18:36.981516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 1 with properties:
name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:39:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2020-04-20 12:18:36.983067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0, 1
2020-04-20 12:18:37.690751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-20 12:18:37.690795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 1
2020-04-20 12:18:37.690805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N Y
2020-04-20 12:18:37.690812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y N
2020-04-20 12:18:37.691712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15344 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:35:00.0, compute capability: 7.0)
2020-04-20 12:18:37.693032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15344 MB memory) -> physical GPU (device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:39:00.0, compute capability: 7.0)
I0420 12:18:49.695656 102048 train_lanenet.py:668] Training from scratch
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W
         [[Node: lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W/read = Identity[T=DT_FLOAT, _class=["loc:@lanen...age/Assign"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train_lanenet.py", line 750, in <module>
    train_lanenet_multi_gpu(args.dataset_dir, args.weights_path, net_flag=args.net_flag)
  File "tools/train_lanenet.py", line 670, in train_lanenet_multi_gpu
    sess.run(init)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W
         [[Node: lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W/read = Identity[T=DT_FLOAT, _class=["loc:@lanen...age/Assign"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W)]]

Caused by op 'lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W/read', defined at:
  File "tools/train_lanenet.py", line 750, in <module>
    train_lanenet_multi_gpu(args.dataset_dir, args.weights_path, net_flag=args.net_flag)
  File "tools/train_lanenet.py", line 585, in train_lanenet_multi_gpu
    train_images, train_binary_labels, train_instance_labels, train_net, optimizer
  File "tools/train_lanenet.py", line 233, in compute_net_gradients
    instance_label=gt_instance_labels, name='lanenet_model'
  File "mypath/lanenet-lane-detection-master/lanenet_model/lanenet.py", line 82, in compute_loss
    reuse=self._reuse
  File "mypath/lanenet-lane-detection-master/lanenet_model/lanenet_front_end.py", line 43, in build_model
    reuse=reuse
  File "mypath/lanenet-lane-detection-master/semantic_segmentation_zoo/vgg16_based_fcn.py", line 357, in build_model
    self._vgg16_fcn_encode(input_tensor=input_tensor, name='vgg16_encode_module')
  File "mypath/lanenet-lane-detection-master/semantic_segmentation_zoo/vgg16_based_fcn.py", line 253, in _vgg16_fcn_encode
    need_layer_norm=True
  File "mypath/lanenet-lane-detection-master/semantic_segmentation_zoo/vgg16_based_fcn.py", line 63, in _vgg16_conv_stage
    use_bias=False, padding=pad, name='conv'
  File "mypath/lanenet-lane-detection-master/semantic_segmentation_zoo/cnn_basenet.py", line 70, in conv2d
    w = tf.get_variable('W', filter_shape, initializer=w_init)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1328, in get_variable
    constraint=constraint)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1090, in get_variable
    constraint=constraint)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable
    constraint=constraint)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 796, in _get_single_variable
    use_resource=use_resource)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 2234, in variable
    use_resource=use_resource)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 2224, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 2207, in default_variable_creator
    constraint=constraint)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 259, in __init__
    constraint=constraint)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 422, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 79, in identity
    return gen_array_ops.identity(input, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3263, in identity
    "Identity", input=input, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W
         [[Node: lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W/read = Identity[T=DT_FLOAT, _class=["loc:@lanen...age/Assign"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](lanenet_model/vgg_frontend/vgg16_encode_module/conv5_2_instance/conv/W)]]
MaybeShewill-CV commented 4 years ago

@FCInter It's werid seeing this. Maybe something different with my local TensorFlow environment. I think that problem should not exist if tensorflow 1.12 was used by you:)

MaybeShewill-CV commented 4 years ago

@FCInter Besides there is a little bug which I have got enough spare time to fix in multi gpu trainner. The single gpu trainner was prefered:)

FCInter commented 4 years ago

@MaybeShewill-CV Thx. Yes the reason is that I'm using a very old tf .19, limited by the server I'm using... Do you have any idea to fix this? I think this shall be simply replacing some codes with those supported by older versions of TF. I kept searching for a whole day but got nothing...

MaybeShewill-CV commented 4 years ago

@FCInter I think the problem should not exist even in tf 1.9. You may debug to see if the sess.run(tf.global_variables_initializer()) was exactly excuted. Make sure the single gpu trainner was used here:)