Closed nathan-miller23 closed 3 years ago
Probably want to leave this open until 1-2 real agents are trained and added into the pre-trained agents directory
Thanks for doing this!
What do you mean by "real" agents by the way?
What do you mean by "real" agents by the way?
I meant agents that were trained for more than a few timesteps. I tried to train an agent on counter_circuit
but the potential function in master
of harl is different than the one I have in my nathan_dev
branch, and thus my hyperparameters didn't transfer over well. I believe @mesutyang97 trained an agent on cramped_room
, in which case I think just adding that one successfully trained agent is sufficient. If you have any comments or think we should include additional trained agents please let me know
Ok cool, yes this sounds like a good idea. Initially thought you had ended up training the agents yourself (till completion). But yes, let's wait for @mesutyang97 to add the actual agent(s) in this PR and then merge it?
Yes, I have trained some good PPO agents for cramped_room
on my local machine. I am working to get overcooked_demo
installed so that I can test them out before including them in this PR. Should be done soon.
The good PPO Rllib self-play agent for cramped_room
has been uploaded, and I can verify locally that it is working as it should. @micahcarroll Please perform a final check before we merge this into master
@mesutyang97 You should probably add to the agent README the exact command you used to train this agent in the human_aware_rl repo. Later on we should clean this up and store this information in a consistent manner, but for now this seems like the most obvious choice to give people something to work off of.
Fixed really subtle circular import error with
human_aware_rl
that caused race conditions with tensorflow imortsAdded some dummy agents (ones that stay and perform random actions) as well as an rllib agent that was trained on overcooked for ~5 training iters that will hopefully server as examples for users
Added detailed instructions on the agent training and loading into demo pipeline I've been using in the README