HumanCompatibleAI / overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
https://arxiv.org/abs/1910.05789
MIT License
709 stars 148 forks source link

Will the trained models (BC and H_proxy) be publicly available? #20

Closed 51616 closed 4 years ago

51616 commented 4 years ago

I want to use these models to be the baseline in my work. Can I have the access to these models? or is there any way I can make sure that my implementation of the baseline (BC and H_proxy) is correct? (e.g. training loss)

edit: I found this file but it still use GAIL model to train behaviour cloning. Is this the version that used in the paper?

micahcarroll commented 4 years ago

Sorry for the wait, I didn't have notifications turned on for some reason. We didn't include the models because there are a lot of them and are each 10 MB or so (but if you want I can send them to you somehow).

However, I would recommend training your own: you can just follow the first couple cells of this Jupyter Notebook. Let me know if you have any issue with this.

The file that you linked is the correct file (which the Jupyter Notebook is also importing): if you look closely, you will see that we only use the "pretrain" method (not the GAIL learn method itself), that is just behaviour cloning.

--

Edit: the link to the notebook is now this

51616 commented 4 years ago

@micahcarroll Thanks for the reply. After looking into the experiment repo, I found this folder which contains the trained BC and H_proxy model.

And this block can load those models as "OTHER_AGENT" in the environment.

def configure_other_agent(params, gym_env, mlp, mdp):
    if params["OTHER_AGENT_TYPE"] == "hm":
        hl_br, hl_temp, ll_br, ll_temp = params["HM_PARAMS"]
        agent = GreedyHumanModel(mlp, hl_boltzmann_rational=hl_br, hl_temp=hl_temp, ll_boltzmann_rational=ll_br, ll_temp=ll_temp)
        gym_env.use_action_method = True

    elif params["OTHER_AGENT_TYPE"][:2] == "bc":
        best_bc_model_paths = load_pickle(BEST_BC_MODELS_PATH)
        if params["OTHER_AGENT_TYPE"] == "bc_train":
            bc_model_path = best_bc_model_paths["train"][mdp.layout_name]
        elif params["OTHER_AGENT_TYPE"] == "bc_test":
            bc_model_path = best_bc_model_paths["test"][mdp.layout_name]
        else:
            raise ValueError("Other agent type must be bc train or bc test")

So I guess I can use them directly as "OTHER_AGENT" in my experiments without running training code in the mentioned notebook?

micahcarroll commented 4 years ago

Yes, you are correct! I had forgotten that while we had not added the PPO agents, we had added the BC ones!