facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
906 stars 127 forks source link

Discussion about future plans after deprecation of OpenAI Gym #776

Closed anthony0727 closed 1 year ago

anthony0727 commented 1 year ago

❓ Questions and Help

Should we decouple CompilerGym from (past) openai/gym?

The team that has been maintaining Gym since 2021 has moved all future development to Gymnasium, a drop in replacement for Gym (import gymnasium as gym), and this repo isn't planned to receive any future updates. Please consider switching over to Gymnasium as you're able to do so. https://github.com/openai/gym#important-notice

Additional Context

I assume you are already aware of it, gym's maintainer has been making breaking API changes.

def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:

https://github.com/Farama-Foundation/Gymnasium/blob/1956e648f951eb64c45440997f8fe484ef3c7138/gymnasium/core.py#L71

I reckon the changes are quite reasonable, but previous API(returning observation, reward, done, info) served as quite rigorous protocol for reinforcement learning.

Is there already on-going discussion regarding this?

TL;DR Most inheritance to CompilerGym from openai/gym was for typing, rather than it's functionality - openai/gym doesn't really have much functionality itself. So we can easily define our own Space, Env, etc

However, if we decouple, many RL libraries depend on openai/gym and might conflict ray[rllib] do strong type checking such as isinstance(obs_space, gym.space.Space)

I hope we(CompilerGym users) do discussion here to help FAIR team.

ChrisCummins commented 1 year ago

Hi @anthony0727, thanks for starting the discussion on this! The main value we get from gym (<=v0.21) is the familiar API for users, and the type inheritance for plugging in libraries like rllib. We already have a bunch of extensions on top of the base API (e.g. the datasets API, custom spaces, the extra args to env.step() and env.reset()) which makes the coupling quite loose.

I think we have a few options:

I would love to get as much input from our users on this as possible. Personally, I would lean toward option (1) or (3), though I think there is a good case for (2) to be made as well.

Cheers, Chris

anthony0727 commented 1 year ago

I agree with the adaptor pattern! discussed with rllib authors and they would likely go with (3) too!

We would just need to implement adaptor whenever API changes.

I'll leave this thread open until Nov. and close if it's okay!

Thanks, Anthony

ChrisCummins commented 1 year ago

Hi @anthony0727, okay that sounds good, thanks for getting back to me. Is there an issue on the rllib repo to track their migration? I may follow their lead.

No need to close this. I'll use it to track the changes and continue discussion 🙂

Cheers, Chris

anthony0727 commented 1 year ago

There's no specific issue thread for this, but

Looking at commit history from https://github.com/ray-project/ray/commits/releases/2.2.0/rllib

seems like it would take some time

This is chat from @sven1977

We are in the process of updating RLlib to be fully compliant with the new gymnasium (or gym>=0.26) APIs. This will include an updated RLlib pettingzoo adapter. It’s a massive undertaking but will hopefully make it into Ray 2.2 (fingers crossed that we get all failing tests to pass by then :slightly_smiling_face: )

Additionally, I'm getting back to compiler optimization with Programl, and I remember async env worked seamlessly, (though I encountered some unable to pickle lock object issue with cbench regarding https://github.com/facebookresearch/CompilerGym/issues/304)

Is there something I should be aware of? from your reply

Would unlock more aggressive changes to CompilerGym APIs for things like performance (e.g. async / batched envs).

My env would be async-ed from this env

def register_all(config, model_cls=None):
    def make_env(dummy):
        env = compiler_gym.make('llvm-ic-v0', **config.env_config)
        train_benchmarks = []
        for i in ['cbench-v1', 'npb-v0', 'chstone-v0']:
            train_benchmarks.extend(
                list(env.datasets[i].benchmark_uris())
            )
        env = CommandlineWithTerminalAction(env)
        env = RandomOrderBenchmarks(env, train_benchmarks)
        env = RllibWrapper(env)
        # env = TimeLimit(env, max_episode_steps=128)
        return env

    register_env(
        ENV_ID,
        make_env
    )

Thanks! Anthony

ChrisCummins commented 1 year ago

Hi Anthony,

Thanks for the extra pointers. I'm in no rush to make preemptive changes, especially if those are compatibility breaking, so I'll keep an eye on rllib developments.

Is there something I should be aware of?

No, you'll be fine with the code snipped you posted. What I meant was that for things like vectorized environments or async environments, we may be able to more efficiently implement them as extensions to the C++ layer and then expose the functionality through new APIs. This wouldn't break any existing code.

Cheers, Chris

ChrisCummins commented 1 year ago

@elliottower confirmed that rllib has migrated to gymnasium so I think this makes a good case for option (2) above. I'll close this in favor of #789 as the issue tracker for the migration.

Cheers, Chris