araffin / rl-tutorial-jnrr19

Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
MIT License
591 stars 113 forks source link

Plotting timestep vs action function #10

Closed NC25 closed 4 years ago

NC25 commented 4 years ago

Hello,

I implemented and trained my PPO for a discrete action space.

env = gym.make('fishing-v0')
model = PPO2(MlpPolicy, env , verbose=2)
model.learn(total_timesteps=100

Now, I am trying to plot a graph that shows timesteps v action so I can see what my model performs

def step():
  obs = env.reset()
  for i in range(100):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()
    y = []
    y.extend(action)
    return y

step()

x = np.linspace(0, 100, 100)
fig, ax = plt.subplots()  # Create a figure and an axes.
ax.plot(x, y, label='linear') 

But I get an error that the object is not iterable I was trying to get y to be a list with 100 reward values so the function can be iterable, but the error shows that it is not the case

I wondering if there was another way so I could plot timestep vs action.

edbeeching commented 4 years ago

Hi,

The list y is being recreated inside the for loop, you should create it once just after obs = env.reset()

However I would suggest evaluating your model on several instances of the problem and taking the average using: from stable_baselines.common.evaluation import evaluate_policy

NC25 commented 4 years ago

@edbeeching

Thank you, I was able to implement


def step():
  obs = env.reset()
  y = []
  for i in range(100):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()
    y.append(action)
  return y

step()

x = np.linspace(0, 100, 100)
fig, ax = plt.subplots()  # Create a figure and an axes.
ax.plot(x, step(), label='linear')