GoogleCloudPlatform / tensorflow-without-a-phd

A crash course in six episodes for software developers who want to become machine learning practitioners.
Apache License 2.0
2.79k stars 911 forks source link

NotFoundError while running the game. #24

Open mahmutkocak opened 6 years ago

mahmutkocak commented 6 years ago

Hi I am relatively a new tf user. I use anaconda 3.5 in windows10. I have installed my tensorflow in a virtual environment. I have run some other RL projects with my tensorflow. But this project gives the error below now;

`(tensorflow) C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong>python -m trainer.task --render --restore --output-dir ./demo-checkpoint
args: {'laziness': 0.01, 'learning_rate': 0.005, 'save_checkpoint_steps': 1, 'n_epoch': 6000, 'max_to_keep': 6000, 'restore': True, 'job_dir': '/tmp/pong_output', 'hidden_dim': 200, 'output_dir': './demo-checkpoint', 'gamma': 0.99, 'decay': 0.99, 'render': True, 'batch_size': 10000}
2018-09-18 09:54:26.123793: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Restoring from ./demo-checkpoint\model.ckpt-6000
2018-09-18 09:54:26.168774: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1275] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key rollout/model/dense/kernel not found in checkpoint

Traceback (most recent call last):
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
    return fn(*args)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key rollout/model/dense/kernel not found in checkpoint
         [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1725, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 877, in run
    run_metadata_ptr)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
    run_metadata)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key rollout/model/dense/kernel not found in checkpoint
         [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong\trainer\task.py", line 293, in <module>
    main(args)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong\trainer\task.py", line 110, in main
    saver = tf.train.Saver(max_to_keep=args.max_to_keep)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1281, in __init__
    self.build()
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1293, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1330, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 778, in _build_internal
    restore_sequentially, reshape)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 397, in _AddRestoreOps
    restore_sequentially)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 829, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1546, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
    op_def=op_def)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key rollout/model/dense/kernel not found in checkpoint
         [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1737, in restore
    checkpointable.OBJECT_GRAPH_PROTO_KEY)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 348, in get_tensor
    status)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong\trainer\task.py", line 293, in <module>
    main(args)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong\trainer\task.py", line 145, in main
    saver.restore(sess, restore_path)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1743, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key rollout/model/dense/kernel not found in checkpoint
         [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong\trainer\task.py", line 293, in <module>
    main(args)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\Lib\site-packages\tensorflow\models\tensorflow-rl-pong\trainer\task.py", line 110, in main
    saver = tf.train.Saver(max_to_keep=args.max_to_keep)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1281, in __init__
    self.build()
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1293, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1330, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 778, in _build_internal
    restore_sequentially, reshape)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 397, in _AddRestoreOps
    restore_sequentially)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 829, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1546, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
    op_def=op_def)
  File "C:\Users\KOCAKMA\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key rollout/model/dense/kernel not found in checkpoint
         [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
`

-I have uninstalled tf and reinstalled. The error continues. -I have used the absolute path as --output-dir . The error continues

Note: The code was successfully running before I changed my windows 10 PATH to make anaconda python default. After this change, my tensorflow stopped working in root. So I created a conda virtual environment and reinstalled it, other projects are working fine in new environment. For pong project the error above came up.

martin-gorner commented 6 years ago

It looks like Tensorflow could not restore from the checkpoint. This usually happens when you have changed your model and try to restore from a checkpoint that does not match the model anymore.

martin-gorner commented 6 years ago

I am wondering if it is not the same issue as this one: https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd/issues/20

We have a bug in the code because something has changed between Tensorflow 1.8 and 1.9. We need to update it. Until we do, this "pong" example will only work with Tensorflow 1.8. Since you are already in a virtual env, can you downgrade your tensorflow to 1.8?

pip install tensorflow==1.8

mahmutkocak commented 6 years ago

I installed tf 1.8 with pip install tensorflow==1.8 now I have an attribute error, in windows 10, anaconda3 virtual environment.

Traceback (most recent call last): File "C:\...anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\...anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\...Documents\RL\tensorflow-rl-pong\trainer\task.py", line 18, in <module> import tensorflow as tf File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\__init__.py", line 24, in <module> from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\__init__.py", line 63, in <module> from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\framework_lib.py", line 104, in <module> from tensorflow.python.framework.importer import import_graph_def File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 32, in <module> from tensorflow.python.framework import function File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\function.py", line 36, in <module> from tensorflow.python.ops import resource_variable_ops File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 35, in <module> from tensorflow.python.ops import variables File "C:\...anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\variables.py", line 40, in <module> class Variable(checkpointable.CheckpointableBase): AttributeError: module 'tensorflow.python.training.checkpointable' has no attribute 'CheckpointableBase'

mahmutkocak commented 6 years ago

I got the same NotFoundError in my ubuntu OS, anaconda3 virtual environment. Then I applied pip install tensorflow==1.8 and it works now. However I can not see the game, I mean no visual demonstration. Only code.