MushroomRL / mushroom-rl

Python library for Reinforcement Learning.
MIT License
826 stars 148 forks source link

Tutorial for REINFORCE #87

Closed RylanSchaeffer closed 2 years ago

RylanSchaeffer commented 2 years ago

I'm trying to implement a simple REINFORCE agent on Gridworld. However, I keep hitting the following error:

  File "/home/rylan/Documents/GanguliGang-Metacognitive-Actor-Critic/mac_venv/lib/python3.6/site-packages/mushroom_rl/core/core.py", line 141, in _run_impl
    sample = self._step(render)
  File "/home/rylan/Documents/GanguliGang-Metacognitive-Actor-Critic/mac_venv/lib/python3.6/site-packages/mushroom_rl/core/core.py", line 188, in _step
    action = self.agent.draw_action(self._state)
  File "/home/rylan/Documents/GanguliGang-Metacognitive-Actor-Critic/mac_venv/lib/python3.6/site-packages/mushroom_rl/core/agent.py", line 65, in draw_action
    return self.policy.draw_action(state)
  File "/home/rylan/Documents/GanguliGang-Metacognitive-Actor-Critic/mac_venv/lib/python3.6/site-packages/mushroom_rl/policy/td_policy.py", line 149, in draw_action
    return np.array([np.random.choice(self._approximator.n_actions,
AttributeError: 'NoneType' object has no attribute 'n_actions'

It appears that the policy needs to be initialized with an approximator. I would really appreciate a simple tutorial showing how to create an approximator and a policy on a simple environment.

Thanks in advance!

RylanSchaeffer commented 2 years ago

I just found https://github.com/MushroomRL/mushroom-rl/blob/dev/examples/lqr_pg.py

If these examples could be more clearly linked in the documentation, that would be fantastic!

boris-il-forte commented 2 years ago

This is a good suggestion! I will put also this in the ToDo list the implementation of policy gradient methods is done only in the context of function approximation/continuous action spaces. Supporting finite actions in these approaches would add major overhead and complications to the code, for no practical benefit (there's not much benefit in using a policy gradient in a setting where you can use a value-based approach directly and easily)