Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
274 stars 26 forks source link

Last `N` actions as `mlp_keys` encoder input for `dreamer_v3` #239

Closed geranim0 closed 1 month ago

geranim0 commented 3 months ago

Hi,

Working on an Atari environment wrapper with action input buffer with len=N that I want to feed as input to mlp_keys. Algo config:

algo:
  mlp_keys:
    encoder: [actions]

However, unable to get it working, getting error TypeError: object of type 'NoneType' has no len() at

File "/home/sam/dev/ml/sheeprl/sheeprl/utils/env.py", line 171, in <listcomp>
    [k for k in env.observation_space.spaces.keys() if len(env.observation_space[k].shape) in {2, 3}]

Because gym.spaces.Tuple has no member shape.

Wondering what should change in this wrapper so it correctly interfaces with what sheeprl expects? Would there be a way to augment Tuple to have a shape, or should it change to a Box? If needed to be Box, what should be its config?


class InputBufferWtihActionsAsInput_Atari(gym.Wrapper):
    def __init__(self, env: gym.Env, input_buffer_amount: int = 0):
        super().__init__(env)
        if input_buffer_amount <= 0:
            raise ValueError("`amount` should be a positive integer")
        self._input_buffer_amount = input_buffer_amount
        self._input_buf = deque(maxlen=input_buffer_amount)
        self.observation_space = gym.spaces.Dict({
                "rgb": self.env.observation_space,
                "actions": gym.spaces.Tuple([self.env.action_space] * input_buffer_amount)
            })

    def get_obs(self, observation):
        return {
            "rgb": observation,
            "actions": self._input_buf
        }

    def reset(self, **kwargs):
        obs, infos = super().reset(**kwargs)

        while len(self._input_buf) < self._input_buf.maxlen:
            self._input_buf.append(self.env.action_space.sample())

        return self.get_obs(obs), infos

    def step(self, action):        
        this_frame_action = self._input_buf[0]
        self._input_buf.append(action)

        obs, reward, done, truncated, infos = self.env.step(this_frame_action)

        return self.get_obs(obs), reward, done, truncated, infos

Edit: I have a working setup using hard-coded, implementation details-aware wrapper using stuff like this. Still wondering how to achieve generic solution though.

        self.observation_space = gym.spaces.Dict({
                "rgb": self.env.observation_space,
                #"last_action": self.env.action_space
                #"actions": gym.spaces.Box(shape=(self.env.action_space.shape, input_buffer_amount), dtype=np.int64)
                #"actions": gym.spaces.Box([self.env.action_space] * input_buffer_amount)
                "actions_0": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_1": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_2": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_3": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
            })

    def get_obs(self, observation: Any) -> Any:
        #observation['past_actions'] = spaces.Space(list(self._input_buf))
        return {
            "rgb": observation,
            #"last_action": self._input_buf[0]
            #"actions": np.array(self._input_buf, dtype=np.int64)
            "actions_0": self._input_buf[0],
            "actions_1": self._input_buf[1],
            "actions_2": self._input_buf[2],
            "actions_3": self._input_buf[3],
        }
michele-milesi commented 3 months ago

Hi @geranim0, yes, the observation space must have the shape attribute. I suggest to use the gymnasium.spaces.Box space to augment the observations of the environment. I prepared a branch with the ActionsAsObservationWrapper that allows you to add last n actions: https://github.com/Eclectic-Sheep/sheeprl/tree/feature/actions-as-obs. You can specify the number of actions in the env.action_stack parameter. You can also add a dilation between actions (like in the FrameStack), you can set the dilation with the env.action_stack_dilation parameter in the configs.

The key is "action_stack", otherwise it creates conflicts during training (add it to the mlp_keys).

Let me know if it works

Note: Discrete actions are converted into one-hot actions (as the agent works with one-hot actions in the discrete case). We can discuss which is the best option.

cc @belerico

geranim0 commented 3 months ago

Hi @Michele,

Thanks for the branch! Taking a look and doing some tests with it.

geranim0 commented 3 months ago

So, did some testing, here are the results

image

Where the gray line represents the agent trained with the last N (in this case, 12) actions added to the observations, and the blue line represents the agent trained with the same input buffer (12), without the input buffer as observation. Only 1 run was made for each, but it looks like in the presence of a large input buffer, adding the input buffer as observations is helpful.

It also suggests that the wrapper works 👍

Only modification I made to your branch was add an input buffer to the wrapper.

michele-milesi commented 3 months ago

Great, I'm glad it works. I do not understand why you added the input buffer and how you used it. Can you show me which modification you made? Thanks

geranim0 commented 3 months ago

Sure, actually it is in my first message, in the step function. Instead of using this frame action, I use the one ready for use in the buffer with this_frame_action = self._input_buf[0].

The purpose of this is to simulate human reaction time. That's why I wanted to test adding the input buffer to the observation, to see if it would improve performance (looks like it does).

michele-milesi commented 3 months ago

Understood, thanks

belerico commented 2 months ago

Hi @geranim0, if this is done we can add this feature in a new PR and put it in the next release

geranim0 commented 2 months ago

Hi @belerico, sure!

Side note though, in tests using Discrete action space, things worked fine, but encountered some problems with the action shape not being handled with MultiDiscrete envs for the action-as-obs wrapper and also dreamer_v3.py::main() with this portion

  real_actions = (
      torch.cat([real_act.argmax(dim=-1) for real_act in real_actions], dim=-1).cpu().numpy()
  )
  step_data["actions"] = actions.reshape((1, cfg.env.num_envs, -1))

For now got around it by reshaping my action space to Discrete. Kind of using an old branch, will re-test when updating.

michele-milesi commented 2 months ago

Hi @geranim0, can you share the error you encountered and which environment you are using? Thanks

michele-milesi commented 2 months ago

I should have fixed the problem, could you check with the multidiscrete action space? Thanks