Closed nathan-miller23 closed 1 year ago
This would also probably involve creating a Trajectory class to better abstract away a lot of the complex dictionary parsing in a more Object Oriented fashion
One issue here, that might become more apparent once @mesutyang97 makes and merges his next PR into master, is that OvercookedEnv is useful when trying to have a Environment with many MDPs – i.e. for training an agent on multiple MDPs at once, it's useful to have an Environment abstraction that handles the switching between the MDPs
I'm not sure if what you are suggesting here is to have OvercookedGridworld handle resets too (in that case, switching between terrains would be handled in OvercookedGridworld too), or if you're suggesting to delegate all Env methods (reset, step, etc.) to the Gym environment. One issue with the latter is that I believe that (correct me if I'm wrong) the Gym environment already assumes all observations have been post-processed (featurized/encoded), and this could make it harder to evaluate agent performance outside of an RL context (as currently AgentEvaluator enables one to do)
To be clearer, what I have in mind is "squashing" the OvercookedEnv and OvercookedGridworld classes into OvercookedEnv. The resulting class would have the union of all instance variables the two classes currently have. The high level functions would be step
and reset
. The functionality of OvercookedGridworld.get_state_transition
would be moved to OvercookedEnv.step
and functionality of things like get_standard_start_state
would be moved to reset
. Gridworld utility functions (like state_string
and potential_function
) would be "hoisted" to OvercookedEnv
You are correct that a gym environment assumes the observations are already encoded, but we already have wrappers (the deprecated Overcooked
class in overcooked_env.py
and the updated OvercookedMultiAgent
in rllib.py
) that have a base_env
(an OvercookedEnv instance) and a featurize_fn
that maps OvercookedState
s to encoded observations.
I do agree that the mdp is a useful abstraction for the multiple layouts case the way we currently have it set up, but I'd push back against the idea that it justifies all the extra complexity and duplication of logic between the mdp and env class. I'm hopeful there will be a more elegant solution that involves a layout_generator class that returns env instances, or passing in lists of layout names instead of a single string, for example
Hmm ok, I see what you're saying. I do think that this might be something we want to leave to later in the future, as I'm sure that updating the code this way would mess very significantly with @mesutyang97's setup. Or in general, this would definitely be something worth heavily coordinating with him.
I've created a separate issue https://github.com/HumanCompatibleAI/overcooked_ai/issues/52 for creating Trajectory class as it can be done separately and won't disrupt @mesutyang97's setup.
I think keeping env and mdp separate is still very important. Let's call a meeting discuss the specifics, but like @micahcarroll said, multi-layout depends on that separation, because one env could have multiple mdp living inside it
Refactoring the code base to merge the functionality of the two high level overcooked classes.
OvercookedState
should still include all the dynamic state data, and should be as lightweight as possible.OvercookedGridworld
will assume the functionality ofOvercookedEnv
to include all static data, such as terrain and episode horizon, as well as trajectory specific data