agentos-project / agentos

The Python Component System (PCS) is an API and CLI for building, running, and sharing Python code. AgentOS is a set of libraries built on top of PCS that make it easy to build, run, and share agents that use Reinforcement Learning.
https://agentos.org
Apache License 2.0
13 stars 4 forks source link

Make agentos.Environment more compatible with gym.Env #129

Open andyk opened 3 years ago

andyk commented 3 years ago

Since gym.Env is a de facto standard, it will be convenient to make AgentOS ports of gym envs themselves be of [sub]type gym.Env. The current port (per #123) wraps gym.envs.classic_control.cartpole.CartPoleEnv but does not inherit from it, so other components that use this class can't just treat it like a gym.Env.

Background: I am porting over SB3 trainer/agent/policy components to ACS and these expect environment to be subclasses of gym.Env. It will make life much easier if these SB3 components can just use the AgentOS ports of gym.Envs. Currently we have a gym Env wrapped inside of an AgentOS Env (e.g., I currently have to do self.Environment.cartpole to get access to the inner Env.)

Options:

  1. Make agentos.Environment a subclass of gym.Env`
  2. Create agentOS.GymEnvironment that is a subclass of agentos.Environment and gym.Env
  3. Make the CartPole ACS component of a subclass of agentos.Environment and gym.Env (currently it only subclasses agentos.Environment)
nickjalbert commented 3 years ago

One reason why I was hesitant to just make agentos.Environment a subclass of gym.Env is that gym seemed like a heavyweight dependency to have in AOS. Also, Acme expects DeepMind environments which are a bit different than gym envs (though there is a wrapper in Acme that converts a gym env into a dm_env).

So that's how we ended up here, but I'm not attached to the current state of affairs and your alternative solutions all seem reasonable. I'll propose doing whatever is fastest for this initial port, but I'm also happy to have a deeper discussion if you'd like!

nickjalbert commented 3 years ago

Also, did SB3 complain because of some sort of type checking or did it try to call some method that was missing on the Cartpole component?

andyk commented 3 years ago

TLDR: The SB3 Algo is looking for env.metadata and env.observation_space, which are a standard part of gym.Env.

SB3 Algorithm (e.g. PPO) constructors take an env, and "wrap [it] with the appropriate wrappers if needed." (see the source here).

Specifically it might try to wrap the env with a custom Monitor, which itself inherits from gym.Wrapper and expects the env object to have a metadata attribute.

Or it might turn the Env into a DummyVecEnv, which expects the env object to have an observation_space attribute (and maybe others).

It's a good point about keeping our dependencies minimal, though note that agentos already depends on gym.

Does Acme make the conversion from gym CartPole to acme CartPole transparent to the user (i.e., does it automatically call that wrapper or does the use have to explicitly do that)? I'm just trying to decide how AgentOS users should think about ACS Envs. It would be nice if ACS users could install cartpole from ACS and have it "just work" with both Acme and SB3 agents and other components. That seems much simpler for users than having to decide between when to use agentos install deepmind_cartpole vs agentos install gym_cartpole.

For now I'll go with option number 3 that I proposed earlier. To do so, I'll modify your ACS CartPole component to seamlessly work as either a DM Env or a Gym Env.

andyk commented 3 years ago

Another observation:

Our agentos.Environment API... https://github.com/agentos-project/agentos/blob/288e868941f648415b628fa15f16acb043d13a2a/agentos/core.py#L81-L105 ...is mostly a copy of the gym.Env API, but currently differs a little bit as observed above (e.g., no metadata member) and I also just noticed that our version of seed(self, seed) is slightly different than gym's (at least gym version 0.18.0) seed(self, seed=None).

If we need to have a core abstraction for an Env in AgentOS (so that the runtime can manipulate envs) then I think it should be much more minimal than what we have currently, maybe just step()?

nickjalbert commented 3 years ago

Does Acme make the conversion from gym CartPole to acme CartPole transparent to the user (i.e., does it automatically call that wrapper or does the use have to explicitly do that)?

My understanding is that the Acme user has to do it manually. For example, the MountainCar environment is manually wrapped with both a GymWrapper and SinglePrecisionWrapper in the Acme quickstart (note I believe the Cartpole seen below is a DeepMind re-implementation found here and thus only gets a SinglePrecisionWrapper):

acme-quickstart

It would be nice if ACS users could install cartpole from ACS and have it "just work" with both Acme and SB3 agents and other components.

Agree, this is what we should aim for. I don't have a great idea how to get there yet (maybe an AOS environment is the union of all other environment abstractions or we somehow know how to automatically wrap an environment?).

It's a good point about keeping our dependencies minimal, though note that agentos already depends on gym.

It looks like the only uses of gym are currently in the example agents, and I think this dependency would (probably?) go away when we port them to use ACR and components.

For now I'll go with option number 3

Sounds good to me!