Why not using proper KL divergence?

allenai / advisor

Apache License 2.0

8 stars 0 forks source link

Why not using proper KL divergence? #1

Closed hai-h-nguyen closed 2 years ago

hai-h-nguyen commented 2 years ago

Hi, any reason why you guys use the cross entropy (log_prob evaluated at the expert_action) as the distance instead of [using proper KL divergence] (as specified in the paper) to calculate the weights?

https://github.com/allenai/advisor/blob/main/advisor_losses.py#L255

Lucaweihs commented 2 years ago

Hi @hai-h-nguyen ,

If you think of the expert policy as having a point mass distribution (100% probability assigned to the expert action) then the cross entropy and KL divergence coincide (since log(1) = 0).

hai-h-nguyen commented 2 years ago

@Lucaweihs , Btw, it seems that the link to the Particle Environment Experiments is inaccessible. Will you guys open source it? If not, I just want to know how did you guys do to calculate the distance for continuous action spaces?

Lucaweihs commented 2 years ago

Hi @hai-h-nguyen, @unnat ran those experiments so I'll let him clarify (if I remember correctly it was based on the L2 distance, i.e. the KL divergence between gaussians).

unnat commented 2 years ago

Thanks @hai-h-nguyen for interest in our work. The MPE repo is public now. Thanks for pointing that out.