Fernadoo / Papers_and_Refs

Intersting papers and references
0 stars 0 forks source link

[ICML'2020] “Other-Play” for Zero-Shot Coordination #14

Closed Fernadoo closed 3 years ago

Fernadoo commented 3 years ago

http://proceedings.mlr.press/v119/hu20a/hu20a.pdf

Fernadoo commented 3 years ago

An implementation on lever game is provided here https://bit.ly/2vYkfI7

Fernadoo commented 3 years ago

Zero-shot coordination is about coordination with previously unseen strangers (agents or humans). Learning process like self-play is widely adopted, since it has efficiently shrunk the search space () and also provided convergence (to at least mixed strategy Nash equilibria) guarantees under certain restricted dynamics, e.g. fictitious play.

An obvious drawback is that the assumption about the other agent being exactly the same might be too strong. While in competitive settings (e.g. zero-sum games) self-play has achieved persuasive results without revealing too much concern, one could note that in cooperative settings (e.g. potential games) agents might collaborate in multiple ways with similar utilities due to the existence of multiple Nash equilibria.

The question investigated in this work is therefore how to find a strategy that is maximally robust to any partners. It is nearly impossible to enumerate all potential partners. Recall that the key assumption under self-play is seeing the other agent as the same. A reasonable weakened assumption would be to assume the other agent as a certain "variant" of the agent that you are designing. The way to make variants is proposed as a bijection onto the policy space itself.

A subsequent question would be, shall an agent that is robust enough (in terms of the overall/averaged ability of cooperating with a bunch of strangers) be powerful enough (in terms of the ability of cooperating with a stranger in one single match)? Efforts in this work are twofold. (1) A high-level concept called meta equilibrium is proposed but, from my perspective, not well defined. (2) Experimental results show that it empirically indeed makes an agent more powerful.

Fernadoo commented 3 years ago

Should provide some useful insights, if we would like to solve the path planning problem while maintaining no idea about the other agents.