DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

Scaling Environment #1907

Closed Hamza-101 closed 2 months ago

Hamza-101 commented 2 months ago

🐛 Bug

check_env result

Traceback (most recent call last): File "D:\Thesis_\Test\PPonew.py", line 461, in check_env(env) File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\env_checker.py", line 409, in check_env assert isinstance( AssertionError: Your environment must inherit from the gymnasium.Env class cf. https://gymnasium.farama.org/api/env/

I have trained and tested a custom Boid flocking environment, in OpenAI Gym, using 3 Boids it works. However when I test it, the policy, for greater than that e.g 10, it gives the following error. image

Code example

  1. Download Code: https://drive.google.com/drive/folders/1c0t-7D5RWumtLY9Bh9kht6P3RAQ4mq0V?usp=sharing
  2. Download Model: https://drive.google.com/file/d/1wrBZ6mSrcaxrWERgvUYA_vLDaA0JL08f/view?usp=drive_link
  3. Place the model with code
  4. cd into where code is and run command:

class FlockingEnv(gym.Env):
    def __init__(self):
        super(FlockingEnv, self).__init__()
        self.episode=0
        self.CTDE=False
        self.current_timestep = 0 
        self.reward_log = []
        self.counter=0

        self.agents = [Agent(position) for position in self.read_agent_locations()]

        min_action = np.array([-5, -5] * len(self.agents), dtype=np.float32)
        max_action = np.array([5, 5] * len(self.agents), dtype=np.float32)
        self.action_space = spaces.Box(low=min_action, high=max_action, dtype=np.float32)

        min_obs = np.array([[-np.inf, -np.inf, -2.5, -2.5]] * len(self.agents), dtype=np.float32)
        max_obs = np.array([[np.inf, np.inf, 2.5, 2.5]] * len(self.agents), dtype=np.float32)
        self.observation_space = spaces.Box(low=min_obs, high=max_obs, dtype=np.float32)

    def step(self, actions):
        #   # Add Gaussian noise to actions
        noisy_actions = actions + np.random.normal(loc=0, scale=0.01, size=actions.shape)

        #Clip actions to action space bounds
        noisy_actions = np.clip(noisy_actions, self.action_space.low, self.action_space.high)

        # if(self.current_timestep % 400000 == 0):
        #     print(self.current_timestep)
        #     self.counter = self.counter + 1
        #     print("Counter", self.counter)

        self.current_timestep+=1
        reward=0
        done=False
        info={}

        observations = self.simulate_agents(actions)
        reward, out_of_flock = self.calculate_reward()

        #Validate this
        if (self.CTDE==False):
            # Terminal Conditions
            for agent in self.agents:
                if((self.check_collision(agent)) or (out_of_flock==True)):
                    done=True
                    env.reset()

        # if self.CTDE:

        #     log_path = os.path.join(Files['Flocking'], 'Testing', 'Rewards', 'Components', f"Episode{episode}")
        #     log_path = os.path.join(log_path, "Reward_Total_log.json")

        #     with open(log_path, 'a') as f:
        #         json.dump((round(reward, 2)), f, indent=4)
        #         f.write('\n')

        self.current_timestep = self.current_timestep + 1

        return observations, reward, done, info

    def close(self):
        print("Environment is closed. Cleanup complete.")

        #Does velocity make a difference
        #Observation Space

def reset(self):
        # seed_everything(SimulationVariables["Seed"])
        env.seed(SimulationVariables["Seed"])
        self.agents = [Agent(position) for position in self.read_agent_locations()]
        for agent in self.agents:
            agent.acceleration = np.round(np.random.uniform(-SimulationVariables["AccelerationInit"], SimulationVariables["AccelerationInit"], size=2), 2)
            agent.velocity = agent.acceleration * SimulationVariables["dt"]               
        observation = self.get_observation()
        return observation 

Relevant log output / Error message

> PS D:\Test> python PPonew.py
> 2024-04-23 12:21:14.425966: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
> 2024-04-23 12:21:15.190142: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
> C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\save_util.py:166: UserWarning: Could not deserialize object clip_range. Consider using `custom_objects` argument to replace this object.
> Exception: Can't get attribute '_make_function' on <module 'cloudpickle.cloudpickle' from 'C:\\Users\\Cr7th\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\cloudpickle\\cloudpickle.py'>
>   warnings.warn(
> C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\save_util.py:166: UserWarning: Could not deserialize object lr_schedule. Consider using `custom_objects` argument to replace this object.
> Exception: Can't get attribute '_make_function' on <module 'cloudpickle.cloudpickle' from 'C:\\Users\\Cr7th\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\cloudpickle\\cloudpickle.py'>
>   warnings.warn(
>   0%|                                                                                                                                                                                                           | 0/5 [00:00<?, ?it/s]Episode: 0
>   0%|                                                                                                                                                                                                           | 0/5 [00:00<?, ?it/s] 
> Traceback (most recent call last):
>   File "D:\Thesis_\Test\PPonew.py", line 499, in <module>
>     action, state = model.predict(obs)
>                     ^^^^^^^^^^^^^^^^^^
>   File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\base_class.py", line 553, in predict
>     return self.policy.predict(observation, state, episode_start, deterministic)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\policies.py", line 363, in predict
>     obs_tensor, vectorized_env = self.obs_to_tensor(observation)
>                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\policies.py", line 270, in obs_to_tensor
>     vectorized_env = is_vectorized_observation(observation, self.observation_space)
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\utils.py", line 399, in is_vectorized_observation
>     return is_vec_obs_func(observation, observation_space)  # type: ignore[operator]
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\utils.py", line 266, in is_vectorized_box_observation
>     raise ValueError(

> ValueError: Error: Unexpected observation shape (10, 4) for Box environment, please use (3, 4) or (n_env, 3, 4) for the observation shape.

System Info

OS: Windows-10-10.0.22631-SP0 10.0.22631 Python: 3.11.4 Stable-Baselines3: 2.2.1 PyTorch: 2.0.0+cpu GPU Enabled: False Numpy: 1.23.5 Cloudpickle: 1.2.2 Gymnasium: 0.29.1 OpenAI Gym: 0.15.7

({'OS': 'Windows-10-10.0.22631-SP0 10.0.22631', 'Python': '3.11.4', 'Stable-Baselines3': '2.2.1', 'PyTorch': '2.0.0+cpu', 'GPU Enabled': 'False', 'Numpy': '1.23.5', 'Cloudpickle': '1.2.2', 'Gymnasium': '0.29.1', 'OpenAI Gym': '0.15.7'}, '- OS: Windows-10-10.0.22631-SP0 10.0.22631\n- Python: 3.11.4\n- Stable-Baselines3: 2.2.1\n- PyTorch: 2.0.0+cpu\n- GPU Enabled: False\n- Numpy: 1.23.5\n- Cloudpickle: 1.2.2\n- Gymnasium: 0.29.1\n- OpenAI Gym: 0.15.7\n')

Checklist

araffin commented 2 months ago

Please have a careful look at https://github.com/DLR-RM/stable-baselines3/issues/982#issuecomment-1197044014

AssertionError: Your environment must inherit from the gymnasium.Env

Please fix any issue found by the env checker before posting an issue about custom env.

Hamza-101 commented 2 months ago

The prompt asked to give minimal viable working code which is why I gave the steps. Giving these 4 functions was also written in details box, which is why I attached it.

My situation is simlar to No error in inheriting Just wants me to switch to gynasium, can't cause it wouldn't install tried multiple times on different os.

So it can be ignored.

qgallouedec commented 2 months ago

What do you mean "it wouldn't install"?

minimal viable working code which is why I gave the steps

It far from being minimal. Providing a MRE would help

Hamza-101 commented 2 months ago

What do you mean "it wouldn't install"?

box2d wouldn't install whenever I tried to setup gymnasium. Tried many times, manually and otherwise.

It far from being minimal. Providing a MRE would help

This is the "minimal" code I could provide while keeping it working.

qgallouedec commented 2 months ago

Gymnasium and box2d are maintained, if you can't install it, open an issue on its GitHub to sort this out. Plus it seems like you've installed it:

Gymnasium: 0.29.1

Please make sure to understand what minimal means: https://github.com/DLR-RM/stable-baselines3/issues/982#issuecomment-1197044014 and https://stackoverflow.com/help/minimal-reproducible-example: I should be able to copy pate you code and run it. I don't have PPonew.py for example.

Hamza-101 commented 2 months ago

Despite it, it won't work. I'll try that too. I know what minimal is but I have provided files, and this is the least amount needed to run it. Tried my best to make it small.