Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
6.82k stars 761 forks source link

[Bug Report] VectorizeAction not working #1169

Closed pkrack closed 2 hours ago

pkrack commented 4 hours ago

Describe the bug

The VectorizeAction wrapper is not correctly implemented.

The problem lies in how self.out is initialized, how iterate is called and how concatenate is called. There is a confusion between what should be self.action_space, self.single_action_space, self.env.action_space and self.env.single_action_space. Cf. example/test below.

I did not submit a PR because the contributing guidelines mention that you have to run the test suite, but the jax functional test seem to run forever on my machine. PR is ready though, let me know if you want me to submit it.

Proposed solution:

diff --git a/gymnasium/wrappers/vector/vectorize_action.py b/gymnasium/wrappers/vector/vectorize_action.py
index 3dc4a797..89677da0 100644
--- a/gymnasium/wrappers/vector/vectorize_action.py
+++ b/gymnasium/wrappers/vector/vectorize_action.py
@@ -138,7 +138,7 @@ class VectorizeTransformAction(VectorActionWrapper):
         self.action_space = batch_space(self.single_action_space, self.num_envs)

         self.same_out = self.action_space == self.env.action_space
-        self.out = create_empty_array(self.single_action_space, self.num_envs)
+        self.out = create_empty_array(self.env.single_action_space, self.num_envs)

     def actions(self, actions: ActType) -> ActType:
         """Applies the wrapper to each of the action.
@@ -154,17 +154,17 @@ class VectorizeTransformAction(VectorActionWrapper):
                 self.single_action_space,
                 tuple(
                     self.wrapper.func(action)
-                    for action in iterate(self.action_space, actions)
+                    for action in iterate(self.single_action_space, actions)
                 ),
                 actions,
             )
         else:
             return deepcopy(
                 concatenate(
-                    self.single_action_space,
+                    self.env.single_action_space,
                     tuple(
                         self.wrapper.func(action)
-                        for action in iterate(self.env.action_space, actions)
+                        for action in iterate(self.single_action_space, actions)
                     ),
                     self.out,
                 )

Code example

"""Test suite for VectorizeAction wrapper."""
import numpy as np

import gymnasium as gym
from gymnasium import spaces
from gymnasium.vector import SyncVectorEnv
from tests.testing_env import GenericTestEnv

def create_env():
    return GenericTestEnv(
        action_space=spaces.Box(
            low=0.0,
            high=1.0,
            shape=(1,),
            dtype=np.float32
        )
    )

def test_vectorize_box_to_dict_action():
    def func(x): return x["key"]
    env = SyncVectorEnv([create_env for _ in range(2)])
    env = gym.wrappers.vector.VectorizeTransformAction(
        env=env,
        wrapper=gym.wrappers.TransformAction,
        func=func,
        action_space=gym.spaces.Dict(
            {
                "key": gym.spaces.Box(
                    low=0.0,
                    high=1.0,
                    shape=(1,),
                    dtype=np.float32
                )
            }
        )
    )
    env.reset()
    # Check that func transforms the outer env action into the inner env action
    assert func(
        env.single_action_space.sample()
    ) in env.env.single_action_space
    # Check that the VectorizeWrapper correctly vectorizes func
    assert env.actions(env.action_space.sample()) in env.env.action_space

System info

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 3 hours ago

Thanks for the issue again, I'll have a look. At least initially, your solution seems to work but I'm uncertain about bits of it

pkrack commented 3 hours ago

In the message of this commit, there is some reasoning about which action space should be used where. Maybe that helps identifying issues.

I think the current implementation was adapted from the VectorizeObservation wrapper. But TransformObservation and TransformAction work in an opposite direction. TransformObservation wants a func that transforms an inner obs to an outer obs, TransformAction wants a func that transforms an outer action to an inner action. Hence the reversal of self.env.x_space and self.x_space in the VectorizeObservation and VectorizeAction wrappers.

pseudo-rnd-thoughts commented 2 hours ago

Yes, in short, the problem was that when I was implementing it I was thinking about the data "moving" in the same direction of observation rather than the opposite.

I'm adding some more testing but yes, I think the primary issue was the out was using self.single_action_space not self.env.single_action_space as you corrected. Also looking at the actions function, the two cases are identical except for the concatenate output which should be actions and out respectively.

Currently, I'm adding more testing for this to doubly check this.

Thanks again @pkrack