google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.23k stars 932 forks source link

RNaD reward transformation #1075

Open spktrm opened 1 year ago

spktrm commented 1 year ago

Based on formulae from the paper, the reward transformation is given by adding the log policy ratio

image

However, the code contains an entropy term instead.

https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L422

Which one is it?

lanctot commented 1 year ago

@perolat, @bartdevylder: any ideas?

bartdevylder commented 1 year ago

Hi, Thanks for your question. The merged_log_policy term in the line you posted actually already contains the log policy ratio. It is defined here: https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L801 taking into account the interpolation between the two regularization policies.

spktrm commented 1 year ago

Hi

Thank you for your reply. I understand this already. I want to understand why the merged_log_policy is multiplied by the policy in the code when this is not communicated in the paper.

bartdevylder commented 1 year ago

Hi, ok now I see your point. The eta_log_policy variable corresponds to the regularisation described in the paper, but the meaning of eta_reg_entropy is not so clear. @perolat will look into this to clarify

spktrm commented 1 year ago

@perolat any updates on this?

sbl1996 commented 6 months ago

@spktrm Do you know the reason? Thanks.

spktrm commented 6 months ago

@spktrm Do you know the reason? Thanks.

Nope, unfortunately. Waiting for @perolat or related to clarify.