coax-dev / coax

Modular framework for Reinforcement Learning in python
https://coax.readthedocs.io
MIT License
168 stars 17 forks source link

MiniMax Algorithm? #30

Open flaport opened 2 years ago

flaport commented 2 years ago

How would you implement a minimax q-learner with coax?

Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!

I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?

I'd like to re-implement something like this old blogpost of mine in coax to get a better feel of the library.

Any help would be greatly appreciated :)

KristianHolsheimer commented 2 years ago

Hi @flaport

First of all thanks for your interest in coax!

It would be great to see multi-agent style setups in coax. I haven't thought much about it, to be honest.

The simplest setup would be to use separate policies and either update the policies individually or write your own policy objective that updates multiple policies at the same time.

Having said that, I'm not an expert in multi-agent RL myself, so I'm not aware of all the subtleties associated with such a setup.

But of course, I welcome contributions and I'm curious to see what you come up with!