google-deepmind / acme

A library of reinforcement learning components and agents
Apache License 2.0
3.48k stars 424 forks source link

Simple Example with CartPole, MountainCar #11

Closed kjacek closed 4 years ago

kjacek commented 4 years ago

Can you give an example with simple gym CartPole and MountainCar problem ? Thank you

tnfru commented 4 years ago

See /examples/tutorial.ipynb that does exactly what you want and can be interactively used in Google Colab.

andy121090 commented 4 years ago

Is there any example that is being solved after training? Running the /examples/tutorial.ipynb for the MountainCar problem does not solve it. I also tried removing the behaviour noise and increasing to 100 episodes.

mwhoffman commented 4 years ago

Hi! Yes. We gave an example of our continuous control agent D4PG running on MuJoCo environments and on gym's mountain car environment. Unfortunately we run most of our results (which you can see from the paper) on MuJoCo environments and didn't tweak anything for gym. If you're able to run on these (via the dm_control suite) there are a number of examples there. But we will be adding some more examples and fixing this example soon!

megamuzzy commented 4 years ago

Please add an example of running R2D2 agent. I tried to run it, but still geting errors. Thanks!

Traceback (most recent call last): File "acme01/examples/atari/run_r2d2.py", line 69, in <module> app.run(main) File "/home/muzzy/acme/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/muzzy/acme/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "acme01/examples/atari/run_r2d2.py", line 62, in main agent = r2d2.R2D2(env_spec, network, burn_in_length=40, trace_length=80, replay_period=40) File "/home/muzzy/acme/lib/python3.6/site-packages/acme/agents/r2d2/agent.py", line 104, in __init__ tf2_utils.create_variables(network, [environment_spec.observations]) File "/home/muzzy/acme/lib/python3.6/site-packages/acme/utils/tf2_utils.py", line 105, in create_variables dummy_output = network(*add_batch_dim(dummy_input)) File "/home/muzzy/acme/lib/python3.6/site-packages/sonnet/src/utils.py", line 89, in _decorate_unbound_method return decorator_fn(bound_method, self, args, kwargs) File "/home/muzzy/acme/lib/python3.6/site-packages/sonnet/src/base.py", line 272, in wrap_with_name_scope return method(*args, **kwargs) File "/home/muzzy/acme/lib/python3.6/site-packages/acme/networks/atari.py", line 91, in __call__ embeddings = self._embed(inputs) File "/home/muzzy/acme/lib/python3.6/site-packages/sonnet/src/utils.py", line 89, in _decorate_unbound_method return decorator_fn(bound_method, self, args, kwargs) File "/home/muzzy/acme/lib/python3.6/site-packages/sonnet/src/base.py", line 272, in wrap_with_name_scope return method(*args, **kwargs) File "/home/muzzy/acme/lib/python3.6/site-packages/acme/networks/embedding.py", line 37, in __call__ if len(inputs.reward.shape.dims) == 1: AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'reward'

kjacek commented 4 years ago

I have simpest code on Colab:


#@title Install necessary dependencies.

!pip install dm-acme
!pip install dm-acme[reverb]
!pip install dm-acme[tf]
!pip install dm-acme[envs]

#@title Import modules.
#python3

from absl import app
from absl import flags

import acme
from acme import specs
from acme import wrappers
from acme.wrappers import gym_wrapper
from acme.agents import dqn

#import bsuite
import sonnet as snt
import gym

environment = gym_wrapper.GymWrapper(gym.make('CartPole-v1'))
environment = wrappers.SinglePrecisionWrapper(environment)
environment_spec = specs.make_environment_spec(environment)

print('actions:\n', environment_spec.actions, '\n')
print('observations:\n', environment_spec.observations, '\n')
print('rewards:\n', environment_spec.rewards, '\n')
print('discounts:\n', environment_spec.discounts, '\n')  

network = snt.Sequential([
    snt.Flatten(),
    snt.nets.MLP([50, 50, environment_spec.actions.num_values])
])

# Construct the agent.
agent = dqn.DQN(
    environment_spec=environment_spec, network=network)

# Run the environment loop.
loop = acme.EnvironmentLoop(environment, agent)
loop.run(num_episodes=100)  # pytype: disable=attribute-error

and got errors:

UnparsedFlagAccessError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/acme/utils/paths.py in get_unique_id() 68 try: ---> 69 FLAGS.acme_id 70 except flags.UnparsedFlagAccessError:

7 frames UnparsedFlagAccessError: Trying to access flag --acme_id before flags were parsed.

During handling of the above exception, another exception occurred:

UnrecognizedFlagError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py in call(self, argv, known_only) 631 suggestions = _helpers.get_flag_suggestions(name, list(self)) 632 raise _exceptions.UnrecognizedFlagError( --> 633 name, value, suggestions=suggestions) 634 635 self.mark_as_parsed()

UnrecognizedFlagError: Unknown command line flag 'f'

How to fix it ? Thank you

kjacek commented 4 years ago

The same error with example: https://github.com/deepmind/acme/blob/master/acme/agents/tf/dqn/agent_test.py

mwhoffman commented 4 years ago

The problem with flags is due to the logging and checkpointing objects which generate path to save data to (in the case of checkpointing this is for the checkpoint itself, and for logging this is to save reward data to a csv file).

These files are generally saved in ~/acme/ACME_ID/... but we also allow you to pass a flag (--acme_id in this case) which redefines the id for an "experiment" you might run so that you can reuse checkpoints from a previous run, etc. The problem is that colab, unfortunately, doesn't parse flags in the same way as everything else (and it has flags that can collide if as we are doing right now we just pass flags straight through to absl.flags).

A quick solution is to turn off checkpointing and logging-to-csv when running in colab. For the DQN agent shown above this can be done by passing:

logger=loggers.TerminalLogger(time_delta=10.) # or any other delta checkpoint=False

You can see an example of this in the tutorial (https://github.com/deepmind/acme/blob/master/examples/tutorial.ipynb) when we create the D4PGLearner. (In the example of the DQN agent these are passed on into the DQNLearner).

Unfortunately this is one of the prices we pay for having things that run in a few different settings, so sometimes one of those settings might fall through the cracks (and unfortunately it's colab at the moment). Bear with us though and we will roll out a better solution shortly!

mwhoffman commented 4 years ago

We've now updated the quickstart guide (https://github.com/deepmind/acme/blob/master/examples/quickstart.ipynb) to properly solve the mountaincar example. The only change necessary was an increase to the exploration noise (sigma). The example only runs a few iterations, but to learn you'll probably need somewhere around 80 episodes. You can always run this cell repeatedly to gather more episodes. We've also included a simple script (https://github.com/deepmind/acme/blob/master/examples/gym/run_d4pg.py) that runs this example as well.

Finally, the changes found in: https://github.com/deepmind/acme/commit/cebfc3568f9b9359ddb23fae9248e0fa0d61ad64 should handle the flag issues that also cropped up when running in colab. We've made sure to upload the package to pypi as well, so on running the colab you should get this updated package which includes these changes.