balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow
4.11k stars 1.89k forks source link

Fine-tuning from ./checkpoints/ssd_300_vgg.ckpt. Ignoring missing vars: False #165

Open ghost opened 6 years ago

ghost commented 6 years ago

i want to train this code for num_classes=2 . I write this code in command prompt:

python train_ssd_network.py \
    --train_dir=./logs/ \
    --dataset_dir=./tfrecords\
    --dataset_name=pascalvoc_2012 \
    --dataset_split_name=train \
    --model_name=ssd_300_vgg \
    --checkpoint_path=./checkpoints/ssd_300_vgg.ckpt \
    --save_summaries_secs=60 \
    --save_interval_secs=600 \
    --weight_decay=0.0005 \
    --optimizer=adam \
    --learning_rate=0.001 \
    --batch_size=32

but I got this error :

    start_standard_services=start_standard_services)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\supervisor.py", line 706, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\session_manager.py", line 264, in prepare_session
    init_fn(sess)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\contrib\framework\python\ops\variables.py", line 655, in callback
    saver.restore(session, model_path)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\saver.py", line 1457, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
    run_metadata_ptr)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [12] rhs shape= [126]
         [[Node: save_1/Assign_29 = Assign[T=DT_FLOAT, _class=["loc:@ssd_300_vgg/block8_box/conv_cls/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](s
sd_300_vgg/block8_box/conv_cls/biases, save_1/RestoreV2_29)]]

Caused by op 'save_1/Assign_29', defined at:
  File "train_ssd_network.py", line 390, in <module>
    tf.app.run()
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_ssd_network.py", line 378, in main
    init_fn=tf_utils.get_init_fn(FLAGS),
  File "F:\Downloads\SSD-Tensorflow-master-\tf_utils.py", line 235, in get_init_fn
    ignore_missing_vars=flags.ignore_missing_vars)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\contrib\framework\python\ops\variables.py", line 653, in assign_from_checkpoint_fn
    saver = tf_saver.Saver(var_list, reshape=reshape_variables)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\saver.py", line 1056, in __init__
    self.build()
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\saver.py", line 1086, in build
    restore_sequentially=self._restore_sequentially)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\saver.py", line 691, in build
    restore_sequentially, reshape)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\training\saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\ops\state_ops.py", line 270, in assign
    validate_shape=validate_shape)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 47, in assign
    use_locking=use_locking, name=name)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Users\User\Anaconda3\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [12] rhs shape= [126]
         [[Node: save_1/Assign_29 = Assign[T=DT_FLOAT, _class=["loc:@ssd_300_vgg/block8_box/conv_cls/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](s
sd_300_vgg/block8_box/conv_cls/biases, save_1/RestoreV2_29)]]
rangqiongliu commented 6 years ago

I got a same problem as yours. If U find a resolution, could U please E-mail it to me? rangqiongliu@gmail.com

hongym7 commented 6 years ago

Delete a checkpoint file. and then, re-run.

Cuky88 commented 6 years ago

If you want to train with 2 classes with weights initialized from VGG16, than you need to exclude the SSD layers and mark them as trainable, since they were trained for 21 classes.

Try to add those lines to your command:

--checkpoint_exclude_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
--trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
CJJ-717 commented 5 years ago

@hongym7 Delete which one?

hongym7 commented 5 years ago

@CJJ-717 Checkpoint file that created by previous learning