PKU-MARL / HARL

Official implementation of HARL algorithms based on PyTorch.
521 stars 64 forks source link

About the assumption of Formula 25 in Lemma G.1. in the update of the actor work. #17

Closed liuda1064838990 closed 1 year ago

liuda1064838990 commented 1 year ago

The mathematical proof in the article is very detailed, but I still have a question that is not clear. The goal of the Actor network in each state is to maximize the soft Q function plus the expected future entropy. But, how can the assumption of Formula 25 in Lemma G.1. be guaranteed in the update of the actor work? I am very confused about this question and look forward to your answer. image

guazimao commented 1 year ago

Hi. HASAC is an instance of MEHAML template with a drift functional of 0 and a neighborhood of $\Pi$. Therefore, the policy update of HASAC can satisfy Equation 28 in Lemma G.2, so that Lemma G.2 assures that the resulting policies satisfy condition Equation 25 in Lemma G.1. I hope my answer can clear up your confusion,