Open 51616 opened 4 years ago
You can modify this line to take 2 different weight files: https://github.com/facebookresearch/hanabi_SAD/blob/6e4ed590f5912fcb99633f4c224778a3ba78879b/pyhanabi/tools/eval_model.py#L23 You might need to hack some settings for loading models trained with/without greedy action input but that should be relatively easy. Our internal repo for the other-play paper has a lot of modifications & tools for running matches between various agents but that is not ready for release yet. Sorry about that.
Q0: We started with the hyper-parameters in provided in the R2D2 paper and that seemed to work quite well. The most important h-param for us was the data generation/consumption ratio, i.e. act_speed/train_speed. Other parameters seemed less important.
Q1: Answered above. You dont have to change anything in the threadloop. The IQL threadloop takes N agents as default and should be good for your task.
Q2: dev.sh is for fast debugging. It uses less compute and starts training faster. I guess you meant sad_2player
vs vdn_2player
? sad_2player
takes the extra greedy action input, which was the main idea of our SAD paper.
Q3: Yes it may have an impact. But as long as the train_speed/act_speed is close to 1, the performance should be fine. You can try with different num_of_threads
, different num_of_games_per_thread
, and multiple act_device
. Those parameters have strong effect on simulation (act) speed.
Thanks for the informative answer!
You might need to hack some settings for loading models trained with/without greedy action input but that should be relatively easy. Our internal repo for the other-play paper has a lot of modifications & tools for running matches between various agents but that is not ready for release yet. Sorry about that.
I guess my questions are a little bit unclear. What you did in the code is training the VDN agents are trained using the shared parameters, but is it possible to use different sets of parameters to train VDN?
Q1: Answered above. You dont have to change anything in the threadloop. The IQL threadloop takes N agents as default and should be good for your task.
What will happen if I use VDN or SAD agents with IQL threadloop? Also, I can't find the auxiliary loss in the code. Is it provided here? @hengyuan-hu
I really appreaciate your work on this area which I am also interested in.
My question is about the code modification to play with other agents (having two agents playing the same game) as far as i know this code uson I want to do some experiment similar to other-play where I can choose the partners in the environment.
I've been playing around the code but can't seem to figure out where to change this. Any suggestion on where I should start?
Edit0: Also, I am curious about how did you do hyperparameters tuning since it quite expensive to run the 5k-epoch training to evaluate each hyperparameters. What heuristics you used for this?
Edit1: My guess is to change the ThreadLoop part which is in
thread_loop.h
to handle multiple actors? Is this correct? Is there a more optimal way to approach this?Edit2: I can't find the auxiliary loss in the code. Is it provided in this version? And what is the difference between
pyhanabi/tools/dev.sh
andpyhanabi/tools/sad_2player.sh
? Do they produce the same experiment?Edit3: If two experiements were run on different machines (with different hardware speed), will this affects the experiment results? Because from what I see from the code, it does asynchonus training while actors are doing self-play. If the training takes longer it will adding more observation to the
replay_buffer
using the old agent.