Multiple policies in MOA training

chasemcd commented 4 years ago

As I understand it, in the initial experiments in the paper only a limited number of agents are trained with the MOA/causal influence reward. In the implementation (train_moa.py), it looks all agents are equipped with the MOA model and receive a causal influence reward. It isn't immediately clear to me how to alter this to allow for variation in agent policies/models, since the Trainers postprocess and incorporate the causal rewards. Does anyone have any insight or suggestions into how this might be done?

internetcoffeephone commented 4 years ago

The basic social influence experiment (number 1) has not been implemented in this repository. Only Experiment III: Modeling Other Agents is present, next to the baseline A3C (and PPO) model.

chasemcd commented 4 years ago

Right, thanks for the clarification.

eugenevinitsky / sequential_social_dilemma_games

Multiple policies in MOA training #178