ericyangyu / PPO-for-Beginners

A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.
MIT License
764 stars 116 forks source link

How to fix: Broken with latest gym pip package #16

Closed catid closed 1 month ago

catid commented 9 months ago

The env.step return values changed, so this code is now how to get it going:

        # Number of timesteps run so far this batch
        t = 0 
        while t < self.timesteps_per_batch:
            # Rewards this episode
            ep_rews = []

            obs = self.env.reset()
            if isinstance(obs, tuple):
                obs = obs[0]  # Assuming the first element of the tuple is the relevant data

            terminated = False
            for ep_t in range(self.max_timesteps_per_episode):
                # Increment timesteps ran this batch so far
                t += 1
                # Collect observation
                batch_obs.append(obs)
                action, log_prob = self.get_action(obs)

                obs, rew, terminated, truncated, _ = self.env.step(action)
                if isinstance(obs, tuple):
                    obs = obs[0]  # Assuming the first element of the tuple is the relevant data

                # Collect reward, action, and log prob
                ep_rews.append(rew)
                batch_acts.append(action)
                batch_log_probs.append(log_prob)

            if terminated or truncated:
                break

Note that you now have to check terminated and truncated return values. Latest documentation is here: https://www.gymlibrary.dev/api/core/

Without this if you follow along with the blog post, it will fail at the end of Blog 3 at this step:

import gym
env = gym.make('Pendulum-v1')
model = PPO(env)
model.learn(10000)

Also you need to update Pendulum-v0 to Pendulum-v1.

LorenzoBianchi02 commented 7 months ago

shouldn't the "if terminated or truncated: break" be inside of the for loop?

ericyangyu commented 1 month ago

Thanks for the catch, I have updated the repo! Sorry for the delays...