Infer fail due to model shape not match

shouyinz commented 5 years ago

I'm trying to run train & infer. However I encounter error while infer

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,48,2] rhs shape= [1,1,80,2]
     [[Node: save/Assign_87 = Assign[T=DT_FLOAT, _class=["loc:@prediction/last_conv1x1/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prediction/last_conv1x1/kernel, save/RestoreV2_87)]]

Could you provide some advise, Thanks

HasnainRaz commented 5 years ago

Can you share the complete error?

shouyinz commented 5 years ago

('First Convolution Out: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(48)]))
('Downsample Out:', TensorShape([Dimension(None), Dimension(128), Dimension(128), Dimension(80)]))
('Downsample Out:', TensorShape([Dimension(None), Dimension(64), Dimension(64), Dimension(128)]))
('Bottleneck Block: ', TensorShape([Dimension(None), Dimension(64), Dimension(64), Dimension(48)]))
('Upsample after concat: ', TensorShape([Dimension(None), Dimension(128), Dimension(128), Dimension(176)]))
('Upsample after concat: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(128)]))
('Mask Prediction: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(2)]))
2018-11-16 10:14:58.673542: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    main()
  File "main.py", line 40, in main
    tiramisu.infer(FLAGS.infer_data, FLAGS.batch_size, FLAGS.ckpt, FLAGS.output_folder)
  File "/Users/justin.tsai/Projects/FC-DenseNet-TensorFlow/model.py", line 378, in infer
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1582, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,48,2] rhs shape= [1,1,80,2]
     [[node save/Assign_87 (defined at /Users/justin.tsai/Projects/FC-DenseNet-TensorFlow/model.py:375)  = Assign[T=DT_FLOAT, _class=["loc:@prediction/last_conv1x1/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prediction/last_conv1x1/kernel, save/RestoreV2:87)]]

Caused by op u'save/Assign_87', defined at:
  File "main.py", line 44, in <module>
    main()
  File "main.py", line 40, in main
    tiramisu.infer(FLAGS.infer_data, FLAGS.batch_size, FLAGS.ckpt, FLAGS.output_folder)
  File "/Users/justin.tsai/Projects/FC-DenseNet-TensorFlow/model.py", line 375, in infer
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 119, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,48,2] rhs shape= [1,1,80,2]
     [[node save/Assign_87 (defined at /Users/justin.tsai/Projects/FC-DenseNet-TensorFlow/model.py:375)  = Assign[T=DT_FLOAT, _class=["loc:@prediction/last_conv1x1/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](prediction/last_conv1x1/kernel, save/RestoreV2:87)]]

python 2.7.15 tensorflow 1.12.0

HasnainRaz commented 5 years ago

It seems that the graph definition has changed from train time. Did you use the same parameters (growth_k, layers per block) as you did during training? I cannot reproduce this. Please try retraining and using the same parameters (growth, layers_per_block etc) on infer, your graph definition should be the same during train and infer.

Closing until confirmed, since I can't reproduce.

HasnainRaz / FC-DenseNet-TensorFlow

Infer fail due to model shape not match #13