LossNAN / I3D-Tensorflow

Train I3D model on ucf101 or hmdb51 by tensorflow
Apache License 2.0
112 stars 28 forks source link

checkpoint cannot be loaded #19

Open rahman-mdatiqur opened 5 years ago

rahman-mdatiqur commented 5 years ago

Hello @LossNAN

thanks for the wonderful repo. I tried to use the multi_gpu_test_kinetics_rgb.py. But it could not load the checkpoints saying that "Key inception_i3d/Conv3d_1a_7x7/batch_norm/beta not found in checkpoint". Below is the full error dump.

Could you please help pointing out what exactly is causing this error?


Traceback (most recent call last): File "multi_gpu_test_thumos14.py", line 336, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "multi_gpu_test_thumos14.py", line 332, in main run_training() File "multi_gpu_test_thumos14.py", line 270, in run_training saver.restore(sess, _MODEL_CKPT) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1562, in restore err, "a Variable name or other graph key that is missing") tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key inception_i3d/Conv3d_1a_7x7/batch_norm/beta not found in checkpoint [[node save/RestoreV2 (defined at multi_gpu_test_thumos14.py:269) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "multi_gpu_test_thumos14.py", line 336, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "multi_gpu_test_thumos14.py", line 332, in main run_training() File "multi_gpu_test_thumos14.py", line 269, in run_training saver = tf.train.Saver() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1102, in init self.build() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1114, in build self._build(self._filename, build_save=True, build_restore=True) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build build_save=build_save, build_restore=build_restore) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 795, in _build_internal restore_sequentially, reshape) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps restore_sequentially) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key inception_i3d/Conv3d_1a_7x7/batch_norm/beta not found in checkpoint [[node save/RestoreV2 (defined at multi_gpu_test_thumos14.py:269) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

LossNAN commented 5 years ago

Please do not use multi_gpu version, there are some bugs have not fixed.

rahman-mdatiqur commented 5 years ago

Thanks @LossNAN for your reply!