Pure Continuous version of ALE on CPP

jjshoots commented 3 months ago

This re-implements #539 but without introducing two separate sets of functionality on the CPP end.

In ALE, only the paddles actually accept continuous actions, while games using the joystick are actually pure discrete in the emulator. Therefore, we cannot implement true continuous for those environments.

What this PR does is implement the clipping on the Python end for continuous environments while using the same ActionEnum backbone of the original ALE, plus an argument for paddle strength in paddle-based games.

What do we want?

The changes in code are motivated by one thing: we want continuous action space for paddles only, keeping the ActionEnum-style action inputs for joysticks since the joystick for Atari is still a discrete action (continuous joystick does not exist).

What did the original ALE interface do?

In the original ALE interface, the actions are only joystick ActionEnum inputs. Then, for games that use a paddle instead of a joystick, joystick controls are mapped into discrete actions applied to paddles, ie:

All left actions (LEFTDOWN, LEFTUP, LEFT...) -> paddle left max
All right actions (RIGHTDOWN, RIGHTUP, RIGHT...) -> paddle right max
Up... etc.
Down... etc.

This results in loss of continuous action for paddles.

How do we allow continuous paddles?

The original interface uses joystick actions for all games to maintain a similar interface for all games, making it easier for RL. Ideally, we want to maintain this functionality, but allow for continuous action inputs for games that allow paddle usage. To do that, we modify the interface on the CPP end:

Old Discrete ALE interface

reward_t ALEInterface::act(Action action)

New Mixed Discrete-Continuous ALE interface

reward_t ALEInterface::act(Action action, float paddle_strength = 1.0)

Then, for games that utilize paddles, if the paddle strength parameter is set (the default value is 1.0), we pass the paddle action to the underlying game via this change:

delta_a = static_cast<int>(-PADDLE_DELTA * fabs(paddle_a_strength));

This maintains backwards compatibility (it performs exactly the same if paddle_x_strength is not applied). For games where the paddle is not used, the paddle_x_strength parameter is just ignored. This mirrors the real world scenario where you have a paddle connected, but the game doesn't react to it when the paddle is turned.

Python side interface

Old Discrete ALE Python Interface

ale.act(action: int)

New Mixed Discrete-Continuous ALE Python Interface

ale.act(action: int, strength: float = 1.0)

ALE Gymnasium Interface

The main change this PR applies over the original CALE implementation is that the discretization is now handled at the Python level. More specifically, when continuous action space is used, we do this.

            x, y = action[0] * np.cos(action[1]), action[0] * np.sin(action[1])
            action_idx = self.map_action_idx(
                left_center_right=(
                    -int(x < self.continuous_action_threshold)
                    + int(x > self.continuous_action_threshold)
                ),
                down_center_up=(
                    -int(y < self.continuous_action_threshold)
                    + int(y > self.continuous_action_threshold)
                ),
                fire=(action[-1] > self.continuous_action_threshold),
            )

More specifically, self.map_action_idx is an lru_cache-ed function that takes the continuous action direction and maps it into an ActionEnum.

This implementation is actually slightly faster than the original CALE implementation because don't compute the action repeatedly for every step (the original implementation placed this within the for _ in range(frame_skip) loop). We also don't need to worrry that self.map_action_idx may be an expensive call because the whole function is cached, so it's basically an O(1) dictionary lookup.

Limitations

One small caveat with this PR (and the original one) is that there are actually games that utilize a paddle and a joystick for control (e.g.: Star Wars and Night Driver). This functionality is not available with this PR, not in #539, and also not in the original ALE. I think it'd be pretty cool to add this in but we probably need to think about how to pass the action space from Stella's end into Python-land for that.

jjshoots commented 3 months ago

@pseudo-rnd-thoughts Ready for review.

@psc-g Would be great it we can get your review here too. AFAICT, this is the same as your implementation so shouldn't affect the reproducibility of your paper/branch. :)

pseudo-rnd-thoughts commented 3 months ago

@jjshoots could you update the pr description to explain what you have done. There seem to be some core functions that have changed.

Could you add example code for interfacing with continuous at cpp and python levels.

Also could you add some more continuous specific tests

Farama-Foundation / Arcade-Learning-Environment