HumanCompatibleAI / overcooked-demo

Web application where humans can play Overcooked with AI agents.
55 stars 25 forks source link

Import hotfix #24

Closed nathan-miller23 closed 3 years ago

nathan-miller23 commented 3 years ago
nathan-miller23 commented 3 years ago

Probably want to leave this open until 1-2 real agents are trained and added into the pre-trained agents directory

micahcarroll commented 3 years ago

Thanks for doing this!

What do you mean by "real" agents by the way?

nathan-miller23 commented 3 years ago

What do you mean by "real" agents by the way?

I meant agents that were trained for more than a few timesteps. I tried to train an agent on counter_circuit but the potential function in master of harl is different than the one I have in my nathan_dev branch, and thus my hyperparameters didn't transfer over well. I believe @mesutyang97 trained an agent on cramped_room, in which case I think just adding that one successfully trained agent is sufficient. If you have any comments or think we should include additional trained agents please let me know

micahcarroll commented 3 years ago

Ok cool, yes this sounds like a good idea. Initially thought you had ended up training the agents yourself (till completion). But yes, let's wait for @mesutyang97 to add the actual agent(s) in this PR and then merge it?

mesutyang97 commented 3 years ago

Yes, I have trained some good PPO agents for cramped_room on my local machine. I am working to get overcooked_demo installed so that I can test them out before including them in this PR. Should be done soon.

mesutyang97 commented 3 years ago

The good PPO Rllib self-play agent for cramped_room has been uploaded, and I can verify locally that it is working as it should. @micahcarroll Please perform a final check before we merge this into master

micahcarroll commented 3 years ago

@mesutyang97 You should probably add to the agent README the exact command you used to train this agent in the human_aware_rl repo. Later on we should clean this up and store this information in a consistent manner, but for now this seems like the most obvious choice to give people something to work off of.