MaybeShewill-CV / bisenetv2-tensorflow

Unofficial tensorflow implementation of real-time scene image segmentation model "BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation"
https://maybeshewill-cv.github.io/bisenetv2-tensorflow/
MIT License
224 stars 59 forks source link

关于cityscapes预训练模型的checkpoint文件 #45

Closed ghost closed 3 years ago

ghost commented 3 years ago

您好,首先非常感谢您的工作,结构非常清晰,平易近人,真的是很优秀的实现。 请问下方便放出cityscapes预训练模型的checkpoint文件吗?

MaybeShewill-CV commented 3 years ago

@LotOfLances Readme里面又链接 仔细看下就知道啦:)

ghost commented 3 years ago

您好,在加载提供的预训练模型时,出现了下错误 Restoring weights from: C:/t_model/cityscapes.ckpt ... 2021-04-09 10:43:02.856293: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key train_step/BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint

Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint Original error: Key train_step/BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint [[node loader_and_saver/save/RestoreV2 (defined at C:/Aneowell/ML/biseNet_seg/bisenetv2-tensorflow-master\trainner\cityscapes\cityscapes_bisenetv2_single_gpu_trainner.py:175) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_loader_and_saver/save/Const_0_0, loader_and_saver/save/RestoreV2/tensor_names, loader_and_saver/save/RestoreV2/shape_and_slices)]] [[{{node loader_and_saver/save/RestoreV2/_609}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_614_loader_and_saver/save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'loader_and_saver/save/RestoreV2', defined at: File "tools/cityscapes/train_bisenetv2_cityscapes.py", line 42, in train_model() File "tools/cityscapes/train_bisenetv2_cityscapes.py", line 32, in train_model worker = single_gpu_trainner.BiseNetV2CityScapesTrainer() File "C:/Aneowell/ML/biseNet_seg/bisenetv2-tensorflow-master\trainner\cityscapes\cityscapes_bisenetv2_single_gpu_trainner.py", line 175, in init self._loader = tf.train.Saver(self._net_var) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\training\saver.py", line 1102, in init self.build() File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\training\saver.py", line 1114, in build self._build(self._filename, build_save=True, build_restore=True) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\training\saver.py", line 1151, in _build build_save=build_save, build_restore=build_restore) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\training\saver.py", line 795, in _build_internal restore_sequentially, reshape) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\training\saver.py", line 406, in _AddRestoreOps restore_sequentially) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\training\saver.py", line 862, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func return func(*args, **kwargs) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op op_def=op_def) File "C:\Users\neowe.conda\envs\biseNet_py36\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key train_step/BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint [[node loader_and_saver/save/RestoreV2 (defined at C:/Aneowell/ML/biseNet_seg/bisenetv2-tensorflow-master\trainner\cityscapes\cityscapes_bisenetv2_single_gpu_trainner.py:175) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_loader_and_saver/save/Const_0_0, loader_and_saver/save/RestoreV2/tensor_names, loader_and_saver/save/RestoreV2/shape_and_slices)]] [[{{node loader_and_saver/save/RestoreV2/_609}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_614_loader_and_saver/save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

请问下这个该如何解决呢?我在自己从头训练的模型上可以正常加载并继续训练,而这个却报错了

MaybeShewill-CV commented 3 years ago

@LotOfLances 调用哪个脚本的时候出这个错 调用命令贴一下:)

ghost commented 3 years ago

直接使用的训练命令 python tools/cityscapes/train_bisenetv2_cityscapes.py 因为是在自己的电脑上试用,所以把cityscapes_bisenetv2.yaml中的MULTI_GPU设为了FALSE,再就是RESTORE_FROM_SNAPSHOT部分设为了True,并把SNAPSHOT_PATH设定到预训练模型的文件夹位置

error

MaybeShewill-CV commented 3 years ago

@LotOfLances 这个是restore的bug,暂时还没修复,先不要设置restore,选择Train from stratch吧:)

ghost commented 3 years ago

哈哈,好的,非常感谢您耐心的解答:-)

MaybeShewill-CV commented 3 years ago

@LotOfLances 没事儿:)

ghost commented 3 years ago

补充下,问题解决,应该是根据单/多卡调用的训练文件不同造成的,即单卡训练的模型只能被单卡恢复训练(多卡也没法恢复单卡的训练模型),在使用多卡后就能正常加载作者提供的模型并继续训练了

MaybeShewill-CV commented 3 years ago

@LotOfLances 感谢分享 我下来看下具体是哪里的问题:)