Open micahcarroll opened 3 years ago
There is an upcoming ipython notebook on trajectory visualization branch that can be used as a good starting point to this task. What can/should be done with this notebook:
This seems good to me! I think we could even just have one notebook (we should put it in Colab), and have a table of contents.
Some non-obvious things that come to mind that the tutorial should address:
We probably later also want to have a tutorial for human_aware_rl repo which shows how to train, load, and evaluate simple agents.
Thanks for the suggestions! I will create a notebook with tutorial for current master branch code as my next task including those suggested things. I'm not familiar with human_aware_rl yet so I will leave it for now.
I've done a simple introduction notebook that works with the current master branch. https://colab.research.google.com/github/bmielnicki/overcooked_ai/blob/introduction_notebook/introduction.ipynb
Some things worth to mention:
Questions:
Thanks Bartek, this is looking good!
I do think that it's worth to first merge the native python visualizations so we can add a little bit of color here too! (I'll check on the native viz PR just after this)
Regarding your questions:
pip
release, and manually check this guide is still working at every new release (that fortunately is quite infrequent). However, given that the native python viz will not be immediately in the pip release, we might at first just install from master, and make the switch to pip install at our next release!Thanks for the reply!
Yeah I think this might be due to the mdp dynamics having changed. I thought @mesutyang97 might have fixed the planner behind the greedy human model, so I think it should be a relatively quick fix if you want to investigate the cause yourself (Bartek). If this seems to complicated, no worries!
Hi Bartek,
Thanks for the work of putting this iPython Notebook together. Wow!
I suspect it is because the BC agent is not pressing "cook" in the updated dynamics (because in the old dynamics, soup will automatically cook when it has 3 items)
I will look more into this. But thanks a lot for the work!
You are right - agents are not pressing cook. I'm already close to pushing the fix for this bug,
Could you let me know how you are planning to fix this? I had a discussion with Micah yesterday about this but couldn't come to a good conclusion. Would be nice to get your opinion
Given that GreedyHumanModel accepts only situation when all_orders has len of 1 we can just cook any soup of selected len.
inside ml_action I've changed this
if soup_nearly_ready and not other_has_dish:
motion_goals = am.pickup_dish_actions(counter_objects)
else:
assert len(state.all_orders) == 1, \
"The current mid level action manager only support 3-onion-soup order, but got orders" \
+ str(state.all_orders)
next_order = list(state.all_orders)[0]
if 'onion' in next_order:
motion_goals = am.pickup_onion_actions(counter_objects)
elif 'tomato' in next_order:
motion_goals = am.pickup_tomato_actions(counter_objects)
else:
motion_goals = am.pickup_onion_actions(counter_objects) + am.pickup_tomato_actions(counter_objects)
to this
if soup_nearly_ready and not other_has_dish:
motion_goals = am.pickup_dish_actions(counter_objects)
else:
assert len(state.all_orders) == 1, \
"The current mid level action manager only support 3-onion-soup order, but got orders" \
+ str(state.all_orders)
next_order = list(state.all_orders)[0]
soups_ready_to_cook_key = '{}_items'.format(len(next_order.ingredients))
soups_ready_to_cook = pot_states_dict[soups_ready_to_cook_key]
if soups_ready_to_cook:
only_pot_states_ready_to_cook = defaultdict(list)
only_pot_states_ready_to_cook[soups_ready_to_cook_key] = soups_ready_to_cook
# we want to cook only soups that has same len as order
motion_goals = am.start_cooking_actions(only_pot_states_ready_to_cook)
elif 'onion' in next_order:
motion_goals = am.pickup_onion_actions(counter_objects)
elif 'tomato' in next_order:
motion_goals = am.pickup_tomato_actions(counter_objects)
else:
motion_goals = am.pickup_onion_actions(counter_objects) + am.pickup_tomato_actions(counter_objects)
Additional improvement would be to cook any soup of size bigger than next_order too, or soups with ingredients that does not match next order (e.g soups that has a tomato when only order is ["onion", "onion", "onion"]
to free up the pots.
BTW state featurization tests (test_state_featurization and test_lossless_state_featurization_shape) failed after this change.
That looks like a good fix, thanks Bartek! I think we can keep it for ["onion", "onion", "onion"]
for now unless Nathan would like to use it (which, I believe he is currently not).
That is unfortunate. Could you share how it is failing?
Features were not equal self.assertTrue(np.array_equal(expected_featurization, featurized_observations))
. I've overwritten pickled features (by uncommenting line with save pickle), result is pushed to https://github.com/bmielnicki/overcooked_ai/tree/greedy_human_model_fix
@mesutyang97 Do you think it is okay to merge this code? I'm not sure how much those failed featurization tests are a concern (I'm not familiar with featurization at all so it is hard to judge for me how my code changed featurization and if it's bug or feature) - it passed all tests including featurization after overwriting the pickled files.
i think it should be good. Could you double check the updated pickled files contain the correct information?
self.rnd_agent_pair
in tests is the greedy human model and it produced a different trajectory than before (because there were cooking any soups). Because of that featurization changed. This explain how featurization tests failed without any change in featurization code. Here is the PR: https://github.com/HumanCompatibleAI/overcooked_ai/pull/64
EDIT: I have not double-checked updated pickled files, but if they were right before I assume they are right now (as nothing from the featurization code changed and there is a simple explanation why tests failed before)
Minor point: @mesutyang97 I think you were confusing BC agents and GreedyHumanModels: what Bartek was fixing was GreedyHumanModels (hardcoded agents that use planner logic), rather than BC agents.
I'll be checking out the PR itself soon.
Updated version with python visualizations: https://colab.research.google.com/drive/1xFNnun0Ykob71cP6oEi8Y9JKQUYXYChD?usp=sharing#scrollTo=s8fYj-bF2Z3R
Some notes:
Runtime
-> Run all
it won't install overcooked_ai. Use Runtime
-> Restart and run all
.I've made a v0 in the README which is currently working (not sure whether the colab notebook above is stale): https://colab.research.google.com/drive/1AAVP2P-QQhbx6WTOnIG54NXLXFbO7y6n#scrollTo=Z1RBlqADnTDw.
To actually address this issue, we should merge the notebook above with my one linked here.
We should also figure out what is an easy way to maintain / test these kinds of notebooks. Primarily, we would want:
This issue is stale because it has been open for 30 days with no activity. It will be closed in 14 days unless there is some new activity
This issue is stale because it has been open for 30 days with no activity. It will be closed in 14 days unless there is some new activity
Creating a tutorial Google Colab notebook on how to use the environment, visualize rollouts, etc. Most useful after python visualizations #45 are completed.