Unify `OvercookedGridworld` and `OvercookedEnv` classes

HumanCompatibleAI / overcooked_ai

A benchmark environment for fully cooperative human-AI performance.

https://arxiv.org/abs/1910.05789

MIT License

717 stars 152 forks source link

Unify `OvercookedGridworld` and `OvercookedEnv` classes #44

Closed nathan-miller23 closed 1 year ago

nathan-miller23 commented 4 years ago

Refactoring the code base to merge the functionality of the two high level overcooked classes. OvercookedState should still include all the dynamic state data, and should be as lightweight as possible. OvercookedGridworld will assume the functionality of OvercookedEnv to include all static data, such as terrain and episode horizon, as well as trajectory specific data

bmielnicki commented 4 years ago

This would also probably involve creating a Trajectory class to better abstract away a lot of the complex dictionary parsing in a more Object Oriented fashion

micahcarroll commented 4 years ago

One issue here, that might become more apparent once @mesutyang97 makes and merges his next PR into master, is that OvercookedEnv is useful when trying to have a Environment with many MDPs – i.e. for training an agent on multiple MDPs at once, it's useful to have an Environment abstraction that handles the switching between the MDPs

I'm not sure if what you are suggesting here is to have OvercookedGridworld handle resets too (in that case, switching between terrains would be handled in OvercookedGridworld too), or if you're suggesting to delegate all Env methods (reset, step, etc.) to the Gym environment. One issue with the latter is that I believe that (correct me if I'm wrong) the Gym environment already assumes all observations have been post-processed (featurized/encoded), and this could make it harder to evaluate agent performance outside of an RL context (as currently AgentEvaluator enables one to do)

nathan-miller23 commented 4 years ago

To be clearer, what I have in mind is "squashing" the OvercookedEnv and OvercookedGridworld classes into OvercookedEnv. The resulting class would have the union of all instance variables the two classes currently have. The high level functions would be step and reset. The functionality of OvercookedGridworld.get_state_transition would be moved to OvercookedEnv.step and functionality of things like get_standard_start_state would be moved to reset. Gridworld utility functions (like state_string and potential_function) would be "hoisted" to OvercookedEnv

You are correct that a gym environment assumes the observations are already encoded, but we already have wrappers (the deprecated Overcooked class in overcooked_env.py and the updated OvercookedMultiAgent in rllib.py) that have a base_env (an OvercookedEnv instance) and a featurize_fn that maps OvercookedStates to encoded observations.

nathan-miller23 commented 4 years ago

I do agree that the mdp is a useful abstraction for the multiple layouts case the way we currently have it set up, but I'd push back against the idea that it justifies all the extra complexity and duplication of logic between the mdp and env class. I'm hopeful there will be a more elegant solution that involves a layout_generator class that returns env instances, or passing in lists of layout names instead of a single string, for example

micahcarroll commented 4 years ago

Hmm ok, I see what you're saying. I do think that this might be something we want to leave to later in the future, as I'm sure that updating the code this way would mess very significantly with @mesutyang97's setup. Or in general, this would definitely be something worth heavily coordinating with him.

bmielnicki commented 4 years ago

I've created a separate issue https://github.com/HumanCompatibleAI/overcooked_ai/issues/52 for creating Trajectory class as it can be done separately and won't disrupt @mesutyang97's setup.

mesutyang97 commented 4 years ago

I think keeping env and mdp separate is still very important. Let's call a meeting discuss the specifics, but like @micahcarroll said, multi-layout depends on that separation, because one env could have multiple mdp living inside it