Feature request: Support POLA

FLAIROx / JaxMARL

Multi-Agent Reinforcement Learning with JAX

Apache License 2.0

393 stars 68 forks source link

Feature request: Support POLA #84

Closed cool-RR closed 3 months ago

cool-RR commented 4 months ago

I would like to run experiments using opponent shaping, possibly the POLA algorithm. I previously looked at Pax and the M-FOS algorithm, but I would like to avoid the meta-game mechanics if possible.

I'm guessing that POLA couldn't be implemented in JaxMARL in a straightforward way because agents need to be aware of each other's gradients, which is different than most RL algorithms.

I'm considering working on this implementation and any changes to JaxMARL needed to support POLA. I would appreciate if you could tell me whether you think that's possible, what the difficulties would be, whether this is something you'll be interested in merging, and any other insights about this potential project.

alexunderch commented 4 months ago

Can you give some reference link to get known with the algorithm?

cool-RR commented 4 months ago

Sure:

alexunderch commented 4 months ago

Hey! Your request is absolutely feasible, you need to change the structure of the learning loop for a players you shape and the others — e.g. like in pax. I will create a branch in the repository this week to facilitate the development of opponent-shaping algorithms for you.

P.s. I am not the author or maintainer, just trying to help with contributions

cool-RR commented 4 months ago

Thanks @alexunderch ! Besides the implementation, it's also important for me to hear from the maintainers whether it aligns with their vision for the package, because I don't want to be maintaining a fork. Tag me when you push this branch.

luchris429 commented 3 months ago

You're right that we don't plan on doing this right now! I think it would be harder in this repo than in pax. Would be interested in merging it if completed, but I think it won't be easy to implement in this repo! I'd recommend trying to fork that repo instead if you can! pax.