google-research / robotics_transformer

Apache License 2.0
1.29k stars 148 forks source link

Failing at loading checkpoints #12

Open AliBuildsAI opened 1 year ago

AliBuildsAI commented 1 year ago

Hi,

I am trying to load the checkpoints. I have followed https://github.com/google-research/robotics_transformer/issues/11 and ran this code:

saved_path = './trained_checkpoints/rt1main'
from tf_agents.policies import py_tf_eager_policy

py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    model_path=saved_path,
    load_specs_from_pbtxt=True,
    use_tf_function=True,
)

But I am getting this error:

Traceback (most recent call last):
  File "/home/ali/workspace/repos/google-research/robotics_transformer/load_checkpoints.py", line 7, in <module>
    py_tf_eager_policy.SavedModelPyTFEagerPolicy(
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 179, in __init__
    policy = tf.compat.v2.saved_model.load(model_path)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 936, in load
    result = load_internal(export_dir, tags, options)["root"]
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 974, in load_internal
    loader = loader_cls(object_graph_proto, saved_model_proto, export_dir,
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 187, in __init__
    self._restore_checkpoint()
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 560, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1351, in restore
    object_graph_string = reader.get_tensor(base.OBJECT_GRAPH_PROTO_KEY)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 66, in get_tensor
    return CheckpointReader.CheckpointReader_GetTensor(
IndexError: Read less bytes than requested
  In call to configurable 'SavedModelPyTFEagerPolicy' (<class 'tf_agents.policies.py_tf_eager_policy.SavedModelPyTFEagerPolicy'>)

Process finished with exit code 1

I am using python 3.8.0 and the following packages:

(rt9) λ › pip list                                                                                      workspace/repos
Package                       Version
----------------------------- ---------
absl-py                       1.4.0
astunparse                    1.6.3
cachetools                    5.3.0
certifi                       2022.12.7
charset-normalizer            3.1.0
cloudpickle                   2.2.1
decorator                     5.1.1
dill                          0.3.6
dm-tree                       0.1.8
etils                         1.1.1
flatbuffers                   23.3.3
gast                          0.5.3
gin-config                    0.5.0
google-auth                   2.16.2
google-auth-oauthlib          0.4.6
google-pasta                  0.2.0
googleapis-common-protos      1.59.0
grpcio                        1.51.3
gym                           0.26.2
gym-notices                   0.0.8
h5py                          3.8.0
idna                          3.4
importlib-metadata            6.1.0
importlib-resources           5.12.0
keras                         2.8.0
Keras-Preprocessing           1.1.2
libclang                      15.0.6.1
Markdown                      3.4.1
MarkupSafe                    2.1.2
numpy                         1.24.2
oauthlib                      3.2.2
opt-einsum                    3.3.0
packaging                     23.0
Pillow                        9.4.0
pip                           23.0.1
promise                       2.3
protobuf                      3.19.6
pyasn1                        0.4.8
pyasn1-modules                0.2.8
requests                      2.28.2
requests-oauthlib             1.3.1
rsa                           4.9
setuptools                    65.6.3
six                           1.16.0
tensorboard                   2.8.0
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
tensorflow                    2.8.2
tensorflow-addons             0.17.1
tensorflow-datasets           4.6.0
tensorflow-estimator          2.8.0
tensorflow-hub                0.12.0
tensorflow-io-gcs-filesystem  0.26.0
tensorflow-metadata           1.9.0
tensorflow-model-optimization 0.7.2
tensorflow-probability        0.16.0
tensorflow-text               2.8.2
termcolor                     2.2.0
tf-agents                     0.12.0
toml                          0.10.2
tqdm                          4.65.0
typeguard                     3.0.1
typing_extensions             4.5.0
urllib3                       1.26.15
Werkzeug                      2.2.3
wheel                         0.38.4
wrapt                         1.15.0
zipp                          3.15.0
oym1994 commented 1 year ago

Hi,

I am trying to load the checkpoints. I have followed #11 and ran this code:

saved_path = './trained_checkpoints/rt1main'
from tf_agents.policies import py_tf_eager_policy

py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    model_path=saved_path,
    load_specs_from_pbtxt=True,
    use_tf_function=True,
)

But I am getting this error:

Traceback (most recent call last):
  File "/home/ali/workspace/repos/google-research/robotics_transformer/load_checkpoints.py", line 7, in <module>
    py_tf_eager_policy.SavedModelPyTFEagerPolicy(
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 179, in __init__
    policy = tf.compat.v2.saved_model.load(model_path)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 936, in load
    result = load_internal(export_dir, tags, options)["root"]
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 974, in load_internal
    loader = loader_cls(object_graph_proto, saved_model_proto, export_dir,
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 187, in __init__
    self._restore_checkpoint()
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 560, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1351, in restore
    object_graph_string = reader.get_tensor(base.OBJECT_GRAPH_PROTO_KEY)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 66, in get_tensor
    return CheckpointReader.CheckpointReader_GetTensor(
IndexError: Read less bytes than requested
  In call to configurable 'SavedModelPyTFEagerPolicy' (<class 'tf_agents.policies.py_tf_eager_policy.SavedModelPyTFEagerPolicy'>)

Process finished with exit code 1

I am using python 3.8.0 and the following packages:

(rt9) λ › pip list                                                                                      workspace/repos
Package                       Version
----------------------------- ---------
absl-py                       1.4.0
astunparse                    1.6.3
cachetools                    5.3.0
certifi                       2022.12.7
charset-normalizer            3.1.0
cloudpickle                   2.2.1
decorator                     5.1.1
dill                          0.3.6
dm-tree                       0.1.8
etils                         1.1.1
flatbuffers                   23.3.3
gast                          0.5.3
gin-config                    0.5.0
google-auth                   2.16.2
google-auth-oauthlib          0.4.6
google-pasta                  0.2.0
googleapis-common-protos      1.59.0
grpcio                        1.51.3
gym                           0.26.2
gym-notices                   0.0.8
h5py                          3.8.0
idna                          3.4
importlib-metadata            6.1.0
importlib-resources           5.12.0
keras                         2.8.0
Keras-Preprocessing           1.1.2
libclang                      15.0.6.1
Markdown                      3.4.1
MarkupSafe                    2.1.2
numpy                         1.24.2
oauthlib                      3.2.2
opt-einsum                    3.3.0
packaging                     23.0
Pillow                        9.4.0
pip                           23.0.1
promise                       2.3
protobuf                      3.19.6
pyasn1                        0.4.8
pyasn1-modules                0.2.8
requests                      2.28.2
requests-oauthlib             1.3.1
rsa                           4.9
setuptools                    65.6.3
six                           1.16.0
tensorboard                   2.8.0
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
tensorflow                    2.8.2
tensorflow-addons             0.17.1
tensorflow-datasets           4.6.0
tensorflow-estimator          2.8.0
tensorflow-hub                0.12.0
tensorflow-io-gcs-filesystem  0.26.0
tensorflow-metadata           1.9.0
tensorflow-model-optimization 0.7.2
tensorflow-probability        0.16.0
tensorflow-text               2.8.2
termcolor                     2.2.0
tf-agents                     0.12.0
toml                          0.10.2
tqdm                          4.65.0
typeguard                     3.0.1
typing_extensions             4.5.0
urllib3                       1.26.15
Werkzeug                      2.2.3
wheel                         0.38.4
wrapt                         1.15.0
zipp                          3.15.0

Hi, have you solved this problem? I also get this error. It would be better if you could provide some solution or advice.

AliBuildsAI commented 1 year ago

Hi, No I could not solve it.

oym1994 commented 1 year ago

Hi, No I could not solve it.

Problem has been solved! You need to download the repo by using "git lfs", instead of "git" or zip file.

JoAnn0812 commented 11 months ago

Hi, No I could not solve it.

Problem has been solved! You need to download the repo by using "git lfs", instead of "git" or zip file.

Hi, could you please provide the full code for loading checkpoints? Many thanks!

jaiber commented 9 months ago

This is what I did: $ sudo apt install git-lfs $ git lfs pull