MaybeShewill-CV / lanenet-lane-detection

Unofficial implemention of lanenet model for real time lane detection
Apache License 2.0
2.36k stars 886 forks source link

Loading pretrained Weights fails. #471

Closed David1234500 closed 3 years ago

David1234500 commented 3 years ago

Hello,

I was trying to load the provided weights in the tools/test_lanenet.py script using the following command:

python3 tools/test_lanenet.py --weights_path /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta --image_path /home/david/training-tensorflow-lane-net/On-Site_Recordings_dataset_01_b1283.png

I tried to load each provided file, but it failed for every file with the same error message:

`/home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) 2020-11-24 15:56:25.642 | INFO | main:test_lanenet:79 - Start reading image and preprocessing 2020-11-24 15:56:25.651 | INFO | main:test_lanenet:85 - Image load complete, cost time: 0.00898s 2020-11-24 15:56:27.761474: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA INFO:tensorflow:Restoring parameters from /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta I1124 15:56:27.926377 2342 tf_logging.py:116] Restoring parameters from /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta 2020-11-24 15:56:27.966747: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 2020-11-24 15:56:27.967125: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 2020-11-24 15:56:27.967160: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_tensor.cc:170 : Data loss: Unable to open table file /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? Traceback (most recent call last): File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tools/test_lanenet.py", line 158, in test_lanenet(args.image_path, args.weights_path) File "tools/test_lanenet.py", line 112, in test_lanenet saver.restore(sess=sess, save_path=weights_path) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "tools/test_lanenet.py", line 158, in test_lanenet(args.image_path, args.weights_path) File "tools/test_lanenet.py", line 109, in test_lanenet saver = tf.train.Saver(variables_to_restore) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in init self.build() File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build build_save=build_save, build_restore=build_restore) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal restore_sequentially, reshape) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps restore_sequentially) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/david/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

DataLossError (see above for traceback): Unable to open table file /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] `

Unfortunatly, my experience with tensorflow is limited, so as far as I understand the weight file seems to be malformed or the way im trying to load it is incorrect.

I am using Tensorflow 1.8 on Ubuntu 18.04LTS and have install all requirements listed in the requirements.txt.

So I wanted to ask, how to properly load the provided weight files and what mistake I am making here.

Thank you very much for your help, David

Tomju2 commented 3 years ago

Hello! Where you able to solve this? i have the same problem thanks!

David1234500 commented 3 years ago

Yes, I was able to solve this. As far as I remember, the solution turned out to be pretty simple: Specify /home/david/training-tensorflow-lane-net/provided_trained_model/tusimple_lanenet.ckpt as the file to load instead of the .meta file. Even though this file does not exist, it apparently instructs tensorflow to load all required files with that prefix (?). If I recall correctly, I found this looking at other repos loading similar checkpoints and noticed that they are not actually specifying a file but rather just the prefix. Quite unintuitive if you ask me, coming from pyTorch but hey...

Tomju2 commented 3 years ago

Thank you! it worked! its really weird to have to use a file that dosent exist.

TrueWodzu commented 2 years ago

@David1234500 Many thanks for your input. This is indeed super unintuitive, even tho documentation mentions about it.