Open andyk opened 3 years ago
One reason why I was hesitant to just make agentos.Environment
a subclass of gym.Env
is that gym seemed like a heavyweight dependency to have in AOS. Also, Acme expects DeepMind environments which are a bit different than gym envs (though there is a wrapper in Acme that converts a gym env into a dm_env).
So that's how we ended up here, but I'm not attached to the current state of affairs and your alternative solutions all seem reasonable. I'll propose doing whatever is fastest for this initial port, but I'm also happy to have a deeper discussion if you'd like!
Also, did SB3 complain because of some sort of type checking or did it try to call some method that was missing on the Cartpole component?
TLDR: The SB3 Algo is looking for env.metadata
and env.observation_space
, which are a standard part of gym.Env
.
SB3 Algorithm (e.g. PPO) constructors take an env, and "wrap [it] with the appropriate wrappers if needed." (see the source here).
Specifically it might try to wrap the env with a custom Monitor, which itself inherits from gym.Wrapper
and expects the env object to have a metadata
attribute.
Or it might turn the Env into a DummyVecEnv, which expects the env object to have an observation_space
attribute (and maybe others).
It's a good point about keeping our dependencies minimal, though note that agentos already depends on gym.
Does Acme make the conversion from gym CartPole to acme CartPole transparent to the user (i.e., does it automatically call that wrapper or does the use have to explicitly do that)? I'm just trying to decide how AgentOS users should think about ACS Envs. It would be nice if ACS users could install cartpole from ACS and have it "just work" with both Acme and SB3 agents and other components. That seems much simpler for users than having to decide between when to use agentos install deepmind_cartpole
vs agentos install gym_cartpole
.
For now I'll go with option number 3 that I proposed earlier. To do so, I'll modify your ACS CartPole component to seamlessly work as either a DM Env or a Gym Env.
Another observation:
Our agentos.Environment
API... https://github.com/agentos-project/agentos/blob/288e868941f648415b628fa15f16acb043d13a2a/agentos/core.py#L81-L105 ...is mostly a copy of the gym.Env
API, but currently differs a little bit as observed above (e.g., no metadata
member) and I also just noticed that our version of seed(self, seed)
is slightly different than gym's (at least gym version 0.18.0) seed(self, seed=None)
.
If we need to have a core abstraction for an Env in AgentOS (so that the runtime can manipulate envs) then I think it should be much more minimal than what we have currently, maybe just step()
?
Does Acme make the conversion from gym CartPole to acme CartPole transparent to the user (i.e., does it automatically call that wrapper or does the use have to explicitly do that)?
My understanding is that the Acme user has to do it manually. For example, the MountainCar environment is manually wrapped with both a GymWrapper
and SinglePrecisionWrapper
in the Acme quickstart (note I believe the Cartpole seen below is a DeepMind re-implementation found here and thus only gets a SinglePrecisionWrapper
):
It would be nice if ACS users could install cartpole from ACS and have it "just work" with both Acme and SB3 agents and other components.
Agree, this is what we should aim for. I don't have a great idea how to get there yet (maybe an AOS environment is the union of all other environment abstractions or we somehow know how to automatically wrap an environment?).
It's a good point about keeping our dependencies minimal, though note that agentos already depends on gym.
It looks like the only uses of gym are currently in the example agents, and I think this dependency would (probably?) go away when we port them to use ACR and components.
For now I'll go with option number 3
Sounds good to me!
Since gym.Env is a de facto standard, it will be convenient to make AgentOS ports of gym envs themselves be of [sub]type
gym.Env
. The current port (per #123) wrapsgym.envs.classic_control.cartpole.CartPoleEnv
but does not inherit from it, so other components that use this class can't just treat it like a gym.Env.Background: I am porting over SB3 trainer/agent/policy components to ACS and these expect environment to be subclasses of
gym.Env
. It will make life much easier if these SB3 components can just use the AgentOS ports of gym.Envs. Currently we have a gym Env wrapped inside of an AgentOS Env (e.g., I currently have to doself.Environment.cartpole
to get access to the inner Env.)Options:
agentos.Environment
a subclass of gym.Env`agentos.Environment
andgym.Env
agentos.Environment
andgym.Env
(currently it only subclassesagentos.Environment
)