google-deepmind / acme

A library of reinforcement learning components and agents
Apache License 2.0
3.52k stars 426 forks source link

RuntimeError: 'replay' nodes were not serializable #234

Closed neardws closed 2 years ago

neardws commented 2 years ago

Hi,

I am trying to run a distributed D4PG agent based on launchpad, the code is lp.launch(program, launch_type='local_mt', serialize_py_nodes=True).

However, it raises the runtime error, which is "The nodes associated to the label 'replay' (<class 'launchpad.nodes.reverb.node.ReverbNode'>) were not serializable using cloudpickle. Make them pickable, or pass `serialize_py_nodes=False` to `lp.launch` if you want to disable this check, for example when you want to use FLAGS, mocks, threading.Event etc, in your node definition."

I tried setting 'serialize_py_nodes=False', then it fixed. But I still wonder that it wether damage the performance?

qstanczyk commented 2 years ago

Can you give this example a try?

neardws commented 2 years ago

Can you give this example a try?

Thank you for your reply, I tried to run the example and got the output.


Traceback (most recent call last):
  File "/home/neardws/Documents/AoV-Journal-Algorithm/Test/run_d4pg.py", line 5, in <module>
    from acme.agents.jax import d4pg
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/acme/agents/jax/d4pg/__init__.py", line 17, in <module>
    from acme.agents.jax.d4pg.agents import D4PG
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/acme/agents/jax/d4pg/agents.py", line 22, in <module>
    from acme.agents.jax.d4pg import builder as d4pg_builder
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/acme/agents/jax/d4pg/builder.py", line 23, in <module>
    from acme.adders import reverb as adders_reverb
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/acme/adders/reverb/__init__.py", line 20, in <module>
    from acme.adders.reverb.base import DEFAULT_PRIORITY_TABLE
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/acme/adders/reverb/base.py", line 28, in <module>
    import reverb
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/reverb/__init__.py", line 21, in <module>
    ensure_tf_install.ensure_tf_version()
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/reverb/platform/default/ensure_tf_install.py", line 37, in ensure_tf_version
    import tensorflow as tf
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 29, in <module>
    from tensorflow.core.framework import function_pb2
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 560, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
qstanczyk commented 2 years ago

This looks like your TF installation is corrupted. Note the import tensorflow as tf in the stack trace. Doesn't the same error happen when you just do import tensorflow as tf?

neardws commented 2 years ago

It seems fixed by reinstalling the lower version of protobuf (i.e., 3.20). However, it raises an ImportError as:

cannot import name 'experiments' from 'acme.jax' (/home/neardws/anaconda3/envs/aov/lib/python3.9/site-packages/acme/jax/__init__.py)
  File "/home/neardws/Documents/AoV-Journal-Algorithm/Test/run_d4pg.py", line 10, in <module>
    from acme.jax import experiments

and there is nothing in the local file acme.jax.__init__.py and the latest version on GitHub.

qstanczyk commented 2 years ago

I think you need to install Acme from sources.

neardws commented 2 years ago

你的邮件已收到,谢谢!