eugenevinitsky / sequential_social_dilemma_games

Repo for reproduction of sequential social dilemmas
MIT License
384 stars 134 forks source link

Agents condition their policy on other agents' actions #113

Closed natashamjaques closed 5 years ago

natashamjaques commented 5 years ago

Agents need to be able to choose their next action based on another agent's past action. This way we can assess the causal influence of one agent's action on another. This is necessary before we can do issue #13.

eugenevinitsky commented 5 years ago

So, what does this actually entail? It should be part of their state space?

natashamjaques commented 5 years ago

Yeah, other agents' actions should be encoded one-hot, and input to each agent in the next timestep. I guess it would be part of their observation space. I was thinking of trying to do this within rllib, but let me know if you think it makes more sense to do within the environment somehow.

eugenevinitsky commented 5 years ago

I think I'm a little confused; if it isn't built in as part of their observation space then won't the neural network not have the right number of inputs? Like, how would you input it in later?

natashamjaques commented 5 years ago

Moving this to other repo.