Playing with other agents and questions regarding the code/experiments

51616 commented 4 years ago

I really appreaciate your work on this area which I am also interested in.

My question is about the code modification to play with other agents (having two agents playing the same game) as far as i know this code uson I want to do some experiment similar to other-play where I can choose the partners in the environment.

I've been playing around the code but can't seem to figure out where to change this. Any suggestion on where I should start?

Edit0: Also, I am curious about how did you do hyperparameters tuning since it quite expensive to run the 5k-epoch training to evaluate each hyperparameters. What heuristics you used for this?

Edit1: My guess is to change the ThreadLoop part which is in thread_loop.h to handle multiple actors? Is this correct? Is there a more optimal way to approach this?

Edit2: I can't find the auxiliary loss in the code. Is it provided in this version? And what is the difference between pyhanabi/tools/dev.sh and pyhanabi/tools/sad_2player.sh? Do they produce the same experiment?

Edit3: If two experiements were run on different machines (with different hardware speed), will this affects the experiment results? Because from what I see from the code, it does asynchonus training while actors are doing self-play. If the training takes longer it will adding more observation to the replay_buffer using the old agent.

hengyuan-hu commented 4 years ago

You can modify this line to take 2 different weight files: https://github.com/facebookresearch/hanabi_SAD/blob/6e4ed590f5912fcb99633f4c224778a3ba78879b/pyhanabi/tools/eval_model.py#L23 You might need to hack some settings for loading models trained with/without greedy action input but that should be relatively easy. Our internal repo for the other-play paper has a lot of modifications & tools for running matches between various agents but that is not ready for release yet. Sorry about that.

Q0: We started with the hyper-parameters in provided in the R2D2 paper and that seemed to work quite well. The most important h-param for us was the data generation/consumption ratio, i.e. act_speed/train_speed. Other parameters seemed less important.

Q1: Answered above. You dont have to change anything in the threadloop. The IQL threadloop takes N agents as default and should be good for your task.

Q2: dev.sh is for fast debugging. It uses less compute and starts training faster. I guess you meant sad_2player vs vdn_2player? sad_2player takes the extra greedy action input, which was the main idea of our SAD paper.

Q3: Yes it may have an impact. But as long as the train_speed/act_speed is close to 1, the performance should be fine. You can try with different num_of_threads, different num_of_games_per_thread, and multiple act_device. Those parameters have strong effect on simulation (act) speed.

51616 commented 4 years ago

Thanks for the informative answer!

You might need to hack some settings for loading models trained with/without greedy action input but that should be relatively easy. Our internal repo for the other-play paper has a lot of modifications & tools for running matches between various agents but that is not ready for release yet. Sorry about that.

I guess my questions are a little bit unclear. What you did in the code is training the VDN agents are trained using the shared parameters, but is it possible to use different sets of parameters to train VDN?

Q1: Answered above. You dont have to change anything in the threadloop. The IQL threadloop takes N agents as default and should be good for your task.

What will happen if I use VDN or SAD agents with IQL threadloop? Also, I can't find the auxiliary loss in the code. Is it provided here? @hengyuan-hu

facebookresearch / hanabi_SAD

Playing with other agents and questions regarding the code/experiments #7