Kaixhin / ACER

Actor-critic with experience replay
MIT License
251 stars 46 forks source link

KL Divergence #6

Closed random-user-x closed 6 years ago

random-user-x commented 6 years ago

https://github.com/Kaixhin/ACER/blob/5b7ca5d75bf16629ddaf68ecab4ab6c7dcccf56c/train.py#L71

Shouldn't the code be like F.kl_div(distribution, ref_distribution, size_average=False). Why there is a log of the distribution.

Kaixhin commented 6 years ago

It is a bit strange but F.kl_div takes log probabilities for the input and probabilities for the target.

random-user-x commented 6 years ago

I see this. It seems a bit strange to me though. Btw could you just clear one more doubt of mine.

The paper https://arxiv.org/pdf/1611.01224.pdf specifies KL divergence(shared param, actual parameter). Now if you check the wiki page https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition , P should be shared parameter and Q should be the actual parameter. Is the code following the same norm?

Kaixhin commented 6 years ago

Ah well spotted - the code is wrong. I've fixed this in 11eb611 - thanks for spotting!