google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.08k stars 897 forks source link

RNaD off policy case #1109

Open spktrm opened 10 months ago

spktrm commented 10 months ago

In the example for RNaD, the importance sampling correction for get_loss_nerd is 1. This is because the example provided is the on-policy case, and there are synchronous updates of the policy between acting and learning.

My question is what needs to be changed for this example to be used in an asynchronous off-policy setting? Is it as simple as substituting the importance sampling correction for a policy ratio term? What would this look like exactly?

How could I construct the importance sampling correction for the off-policy case?

spktrm commented 9 months ago

@perolat any ideas?

spktrm commented 5 months ago

@lanctot is there a better channel to get in contact with @perolat - I feel as though he may have missed my email.

lanctot commented 5 months ago

I just chatted with him and will send him the currently open questions later today. Is this currently the only unresolved one?

spktrm commented 5 months ago

Hi,

Both this issue and this one: https://github.com/google-deepmind/open_spiel/issues/1075

Keen to hear back :)