Closed ghost closed 3 years ago
@LotOfLances Readme里面又链接 仔细看下就知道啦:)
您好,在加载提供的预训练模型时,出现了下错误 Restoring weights from: C:/t_model/cityscapes.ckpt ... 2021-04-09 10:43:02.856293: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key train_step/BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint Original error: Key train_step/BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint [[node loader_and_saver/save/RestoreV2 (defined at C:/Aneowell/ML/biseNet_seg/bisenetv2-tensorflow-master\trainner\cityscapes\cityscapes_bisenetv2_single_gpu_trainner.py:175) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_loader_and_saver/save/Const_0_0, loader_and_saver/save/RestoreV2/tensor_names, loader_and_saver/save/RestoreV2/shape_and_slices)]] [[{{node loader_and_saver/save/RestoreV2/_609}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_614_loader_and_saver/save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'loader_and_saver/save/RestoreV2', defined at:
File "tools/cityscapes/train_bisenetv2_cityscapes.py", line 42, in
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key train_step/BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint [[node loader_and_saver/save/RestoreV2 (defined at C:/Aneowell/ML/biseNet_seg/bisenetv2-tensorflow-master\trainner\cityscapes\cityscapes_bisenetv2_single_gpu_trainner.py:175) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_loader_and_saver/save/Const_0_0, loader_and_saver/save/RestoreV2/tensor_names, loader_and_saver/save/RestoreV2/shape_and_slices)]] [[{{node loader_and_saver/save/RestoreV2/_609}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_614_loader_and_saver/save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
请问下这个该如何解决呢?我在自己从头训练的模型上可以正常加载并继续训练,而这个却报错了
@LotOfLances 调用哪个脚本的时候出这个错 调用命令贴一下:)
直接使用的训练命令 python tools/cityscapes/train_bisenetv2_cityscapes.py 因为是在自己的电脑上试用,所以把cityscapes_bisenetv2.yaml中的MULTI_GPU设为了FALSE,再就是RESTORE_FROM_SNAPSHOT部分设为了True,并把SNAPSHOT_PATH设定到预训练模型的文件夹位置
@LotOfLances 这个是restore的bug,暂时还没修复,先不要设置restore,选择Train from stratch吧:)
哈哈,好的,非常感谢您耐心的解答:-)
@LotOfLances 没事儿:)
补充下,问题解决,应该是根据单/多卡调用的训练文件不同造成的,即单卡训练的模型只能被单卡恢复训练(多卡也没法恢复单卡的训练模型),在使用多卡后就能正常加载作者提供的模型并继续训练了
@LotOfLances 感谢分享 我下来看下具体是哪里的问题:)
您好,首先非常感谢您的工作,结构非常清晰,平易近人,真的是很优秀的实现。 请问下方便放出cityscapes预训练模型的checkpoint文件吗?