Avalon-Benchmark / avalon

A 3D video game environment and benchmark designed from scratch for reinforcement learning research
https://generallyintelligent.com/avalon/
GNU General Public License v3.0
175 stars 16 forks source link

Add support for custom environments #13

Closed timokau closed 1 year ago

timokau commented 1 year ago

I found this useful when applying to baseline agents to environments with custom wrappers. I understand that you may consider this out of scope for avalon, feel free to reject the PR in that case.


This makes it possible to use custom Gym environments by specifying the suite "builder" and passing a function to "task" that expects no parameters and returns a gym environment when called.

One example use case is to use gym environments with a pre-applied set of wrappers.

mx781 commented 1 year ago

I think this sort of change should be something best expressed in user code, rather than making its way into the library codebase. This is because a) there might still be environments that you can't express / instantiate in this params.task() way (e.g. you want to init them using a config not expressible in env_params, or you don't want the final wrappers our build_env tacks on) and b) we want to keep the EnvironmentParams interface light, sticking to serializable types so that params can be modifed via the command line (e.g. train_ppo_avalon.py does that).

We didn't really consider people using our baselines for other environments since the focus here is on environment, not the algorithms, but it would obviously make sense to be able to do this without touching the library code.

For now the best way, nasty as it is, is to monkeypatch build_env prior to training - that way you can have full control over the env building process without having to fork avalon:

def my_build_env(env_params: EnvironmentParams) -> gym.Env:
  return MyCustomWrappedGymEnv(...)

avalon.agent.common.envs.build_env = my_build_env
params = parse_args(DreamerGodotParams())
trainer = OffPolicyTrainer(params)
...
trainer.train()
...

I'll leave the PR open so we remember to make this easier to do in the future - thanks for pointing out this nuisance!

timokau commented 1 year ago

Great, thanks for the explanation and the workaround :)

We didn't really consider people using our baselines for other environments since the focus here is on environment, not the algorithms, but it would obviously make sense to be able to do this without touching the library code.

I understand. I was mainly interested in the dreamer implementation, since I had a hard time finding a validated pytorch implementation that sticks close to the original. I considered (and still might) rolling my own, but preferred using your tried and tested version for now. Thank you for that!

I think the environment looks very nice as well, I may work with that at some point in the future too.

zplizzi commented 1 year ago

Glad you're liking the Dreamer implementation, it was a major project getting it to reproduce really well, and we put a lot of effort into validating that it reproduced the original very closely :) Assuming we haven't broken anything since then though it should perform very close to danijar's dreamerv2 repo.

I agree with @mx781 that for folks doing serious work with this code, it's probably better to just clone the code locally and modify it to your liking - having it run as a pip package is really just a convenience for people doing vanilla things with it imo.

mx781 commented 1 year ago

Closing this. For anyone stumbling upon this later, feel free to use the workaround above, or fork-and-modify.

timokau commented 1 year ago

Okay I understand, thanks for the feedback and the proposed workaround.

Glad you're liking the Dreamer implementation, it was a major project getting it to reproduce really well, and we put a lot of effort into validating that it reproduced the original very closely :) Assuming we haven't broken anything since then though it should perform very close to danijar's dreamerv2 repo.

It shows :) Thank you for your work on this.