balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow
4.11k stars 1.89k forks source link

Finetune using other dataset with NotFoundError: Restoring from checkpoint failed #323

Open xiyuanzh opened 5 years ago

xiyuanzh commented 5 years ago

I have met the following problem when I tried to use SSD to detect only the dog class in VOC2007 dataset. So the number of class is two, and I have modified the parameter "num_classes" in ssd_vgg_300.py and train_ssd_network.py.

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power not found in checkpoint
     [[node save/RestoreV2 (defined at train_ssd_network.py:372)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

My input script is as below:

CUDA_VISIBLE_DEVICES=0 python3 train_ssd_network.py \
    --train_dir=./log_files/log_dog \
    --dataset_dir=./datasets/tfrecords \
    --dataset_name=pascalvoc_2007 \
    --dataset_split_name=train \
    --model_name=ssd_300_vgg \
    --checkpoint_path=./checkpoints/vgg_16.ckpt \
    --checkpoint_model_scope=vgg_16 \
    --checkpoint_exclude_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
    --trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
    --save_summaries_secs=60 \
    --save_interval_secs=600 \
    --weight_decay=0.0005 \
    --optimizer=adam \
    --learning_rate=0.001 \
    --batch_size=32 \

I have also tried the ssd_vgg_300.ckpt, which also failed. Has anyone met the same problem before? I need your help!! Thanks a lot!!!

xiyuanzh commented 5 years ago

I have solved it by deleting the log directory.

Unhapppy commented 5 years ago

I have solved it by deleting the log directory.

老哥能详细点说下吗,我也被这个error困扰着···

15727652201 commented 5 years ago

I have solved it by deleting the log directory 兄弟。能详细一点吗?我被这个问题困了几天了。万分感谢

guoxinzhan commented 4 years ago

delete --checkpoint_model_scope=vgg_16 can works

JulioZhao1997 commented 4 years ago

you can exclude some box layers as follows: checkpoint_exclude_scopes=ssd_300_vgg/block11_box,ssd_300_vgg/block11_box/conv_cls/weights,ssd_300_vgg/block10_box,ssd_300_vgg/block9_box,ssd_300_vgg/block8_box,ssd_300_vgg/block7_box,ssd_300_vgg/block4_box using this command to load pretrained model but i dont't know its effects on training performance