Closed 51616 closed 4 years ago
Sorry for the wait, I didn't have notifications turned on for some reason. We didn't include the models because there are a lot of them and are each 10 MB or so (but if you want I can send them to you somehow).
However, I would recommend training your own: you can just follow the first couple cells of this Jupyter Notebook. Let me know if you have any issue with this.
The file that you linked is the correct file (which the Jupyter Notebook is also importing): if you look closely, you will see that we only use the "pretrain" method (not the GAIL learn method itself), that is just behaviour cloning.
--
Edit: the link to the notebook is now this
@micahcarroll Thanks for the reply. After looking into the experiment repo, I found this folder which contains the trained BC and H_proxy model.
And this block can load those models as "OTHER_AGENT" in the environment.
def configure_other_agent(params, gym_env, mlp, mdp):
if params["OTHER_AGENT_TYPE"] == "hm":
hl_br, hl_temp, ll_br, ll_temp = params["HM_PARAMS"]
agent = GreedyHumanModel(mlp, hl_boltzmann_rational=hl_br, hl_temp=hl_temp, ll_boltzmann_rational=ll_br, ll_temp=ll_temp)
gym_env.use_action_method = True
elif params["OTHER_AGENT_TYPE"][:2] == "bc":
best_bc_model_paths = load_pickle(BEST_BC_MODELS_PATH)
if params["OTHER_AGENT_TYPE"] == "bc_train":
bc_model_path = best_bc_model_paths["train"][mdp.layout_name]
elif params["OTHER_AGENT_TYPE"] == "bc_test":
bc_model_path = best_bc_model_paths["test"][mdp.layout_name]
else:
raise ValueError("Other agent type must be bc train or bc test")
So I guess I can use them directly as "OTHER_AGENT" in my experiments without running training code in the mentioned notebook?
Yes, you are correct! I had forgotten that while we had not added the PPO agents, we had added the BC ones!
I want to use these models to be the baseline in my work. Can I have the access to these models? or is there any way I can make sure that my implementation of the baseline (BC and H_proxy) is correct? (e.g. training loss)
edit: I found this file but it still use GAIL model to train behaviour cloning. Is this the version that used in the paper?