Open spktrm opened 1 year ago
@perolat, @bartdevylder: any ideas?
Hi, Thanks for your question. The merged_log_policy term in the line you posted actually already contains the log policy ratio. It is defined here: https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L801 taking into account the interpolation between the two regularization policies.
Hi
Thank you for your reply. I understand this already. I want to understand why the merged_log_policy is multiplied by the policy in the code when this is not communicated in the paper.
Hi, ok now I see your point. The eta_log_policy variable corresponds to the regularisation described in the paper, but the meaning of eta_reg_entropy is not so clear. @perolat will look into this to clarify
@perolat any updates on this?
@spktrm Do you know the reason? Thanks.
@spktrm Do you know the reason? Thanks.
Nope, unfortunately. Waiting for @perolat or related to clarify.
Based on formulae from the paper, the reward transformation is given by adding the log policy ratio
However, the code contains an entropy term instead.
https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L422
Which one is it?