HumanCompatibleAI / overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
https://arxiv.org/abs/1910.05789
MIT License
661 stars 139 forks source link

Tutorial Google Colab Notebook #48

Open micahcarroll opened 3 years ago

micahcarroll commented 3 years ago

Creating a tutorial Google Colab notebook on how to use the environment, visualize rollouts, etc. Most useful after python visualizations #45 are completed.

bmielnicki commented 3 years ago

There is an upcoming ipython notebook on trajectory visualization branch that can be used as a good starting point to this task. What can/should be done with this notebook:

  1. Add native python state visualizations along with an explanation of how to use it.
  2. Check if code/explanation is up to date with the current state of the library and fix what needs to be changed.
  3. Split notebook to series of short notebooks covering specific topics more in-depth (to e.g. explaining most of the config that can be used for state visualization/trajectory chart)? Is it a good idea?
  4. There is anything else to add here?
micahcarroll commented 3 years ago

This seems good to me! I think we could even just have one notebook (we should put it in Colab), and have a table of contents.

micahcarroll commented 3 years ago

Some non-obvious things that come to mind that the tutorial should address:

We probably later also want to have a tutorial for human_aware_rl repo which shows how to train, load, and evaluate simple agents.

bmielnicki commented 3 years ago

Thanks for the suggestions! I will create a notebook with tutorial for current master branch code as my next task including those suggested things. I'm not familiar with human_aware_rl yet so I will leave it for now.

bmielnicki commented 3 years ago

I've done a simple introduction notebook that works with the current master branch. https://colab.research.google.com/github/bmielnicki/overcooked_ai/blob/introduction_notebook/introduction.ipynb

Some things worth to mention:

Questions:

micahcarroll commented 3 years ago

Thanks Bartek, this is looking good!

I do think that it's worth to first merge the native python visualizations so we can add a little bit of color here too! (I'll check on the native viz PR just after this)

Regarding your questions:

bmielnicki commented 3 years ago

Thanks for the reply!

  1. Month ago or so (probably before the big changes in planners code) human model gave better results that 0 rewards. I'm not sure what happened here and if its sign of some minor error. Showing the agent that plays well would enhance the experience but is not necessary.
  2. We can convert notebook to .py file using nbconvert and run it. It probably won't detect if something is wrong with the code output, just if it raises an error (so it will capture most of the obvious errors, and more subtle ones can be also not detected by human anyway). My concern is rather for merging code that would break the tutorial without raising an error. Pip releases can be more tested so its less of the concern here.
  3. I will leave it in the current state then.
micahcarroll commented 3 years ago

Yeah I think this might be due to the mdp dynamics having changed. I thought @mesutyang97 might have fixed the planner behind the greedy human model, so I think it should be a relatively quick fix if you want to investigate the cause yourself (Bartek). If this seems to complicated, no worries!

mesutyang97 commented 3 years ago

Hi Bartek,

Thanks for the work of putting this iPython Notebook together. Wow!

I suspect it is because the BC agent is not pressing "cook" in the updated dynamics (because in the old dynamics, soup will automatically cook when it has 3 items)

I will look more into this. But thanks a lot for the work!

bmielnicki commented 3 years ago

You are right - agents are not pressing cook. I'm already close to pushing the fix for this bug,

mesutyang97 commented 3 years ago

Could you let me know how you are planning to fix this? I had a discussion with Micah yesterday about this but couldn't come to a good conclusion. Would be nice to get your opinion

bmielnicki commented 3 years ago

Given that GreedyHumanModel accepts only situation when all_orders has len of 1 we can just cook any soup of selected len.

inside ml_action I've changed this

            if soup_nearly_ready and not other_has_dish:
                motion_goals = am.pickup_dish_actions(counter_objects)
            else:
                assert len(state.all_orders) == 1, \
                    "The current mid level action manager only support 3-onion-soup order, but got orders" \
                    + str(state.all_orders)
                next_order = list(state.all_orders)[0]

                if 'onion' in next_order:
                    motion_goals = am.pickup_onion_actions(counter_objects)
                elif 'tomato' in next_order:
                    motion_goals = am.pickup_tomato_actions(counter_objects)
                else:
                    motion_goals = am.pickup_onion_actions(counter_objects) + am.pickup_tomato_actions(counter_objects)

to this

            if soup_nearly_ready and not other_has_dish:
                motion_goals = am.pickup_dish_actions(counter_objects)
            else:
                assert len(state.all_orders) == 1, \
                    "The current mid level action manager only support 3-onion-soup order, but got orders" \
                    + str(state.all_orders)
                next_order = list(state.all_orders)[0]
                soups_ready_to_cook_key = '{}_items'.format(len(next_order.ingredients))
                soups_ready_to_cook = pot_states_dict[soups_ready_to_cook_key]
                if soups_ready_to_cook:
                    only_pot_states_ready_to_cook = defaultdict(list)
                    only_pot_states_ready_to_cook[soups_ready_to_cook_key] = soups_ready_to_cook
                    # we want to cook only soups that has same len as order
                    motion_goals = am.start_cooking_actions(only_pot_states_ready_to_cook)
                elif 'onion' in next_order:
                    motion_goals = am.pickup_onion_actions(counter_objects)
                elif 'tomato' in next_order:
                    motion_goals = am.pickup_tomato_actions(counter_objects)
                else:
                    motion_goals = am.pickup_onion_actions(counter_objects) + am.pickup_tomato_actions(counter_objects)

Additional improvement would be to cook any soup of size bigger than next_order too, or soups with ingredients that does not match next order (e.g soups that has a tomato when only order is ["onion", "onion", "onion"] to free up the pots.

BTW state featurization tests (test_state_featurization and test_lossless_state_featurization_shape) failed after this change.

mesutyang97 commented 3 years ago

That looks like a good fix, thanks Bartek! I think we can keep it for ["onion", "onion", "onion"] for now unless Nathan would like to use it (which, I believe he is currently not).

That is unfortunate. Could you share how it is failing?

bmielnicki commented 3 years ago

Features were not equal self.assertTrue(np.array_equal(expected_featurization, featurized_observations)). I've overwritten pickled features (by uncommenting line with save pickle), result is pushed to https://github.com/bmielnicki/overcooked_ai/tree/greedy_human_model_fix

bmielnicki commented 3 years ago

@mesutyang97 Do you think it is okay to merge this code? I'm not sure how much those failed featurization tests are a concern (I'm not familiar with featurization at all so it is hard to judge for me how my code changed featurization and if it's bug or feature) - it passed all tests including featurization after overwriting the pickled files.

mesutyang97 commented 3 years ago

i think it should be good. Could you double check the updated pickled files contain the correct information?

bmielnicki commented 3 years ago

self.rnd_agent_pair in tests is the greedy human model and it produced a different trajectory than before (because there were cooking any soups). Because of that featurization changed. This explain how featurization tests failed without any change in featurization code. Here is the PR: https://github.com/HumanCompatibleAI/overcooked_ai/pull/64

EDIT: I have not double-checked updated pickled files, but if they were right before I assume they are right now (as nothing from the featurization code changed and there is a simple explanation why tests failed before)

micahcarroll commented 3 years ago

Minor point: @mesutyang97 I think you were confusing BC agents and GreedyHumanModels: what Bartek was fixing was GreedyHumanModels (hardcoded agents that use planner logic), rather than BC agents.

I'll be checking out the PR itself soon.

bmielnicki commented 3 years ago

Updated version with python visualizations: https://colab.research.google.com/drive/1xFNnun0Ykob71cP6oEi8Y9JKQUYXYChD?usp=sharing#scrollTo=s8fYj-bF2Z3R

Some notes:

  1. There is no need to rush with a review of this as https://github.com/HumanCompatibleAI/overcooked_ai/pull/53 should be merged first anyway. Mow the code uses branch from my fork, but it probably should not when published and added to the master branch.
  2. Trajectory sliders work only when the notebook is run by user - if you just go into the link and not run the code trajectory slider won't work.
  3. For some reason if you go into the link and click Runtime -> Run all it won't install overcooked_ai. Use Runtime -> Restart and run all.
micahcarroll commented 2 years ago

I've made a v0 in the README which is currently working (not sure whether the colab notebook above is stale): https://colab.research.google.com/drive/1AAVP2P-QQhbx6WTOnIG54NXLXFbO7y6n#scrollTo=Z1RBlqADnTDw.

To actually address this issue, we should merge the notebook above with my one linked here.

We should also figure out what is an easy way to maintain / test these kinds of notebooks. Primarily, we would want:

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. It will be closed in 14 days unless there is some new activity

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. It will be closed in 14 days unless there is some new activity