Closed hai-h-nguyen closed 2 years ago
Hi @hai-h-nguyen ,
If you think of the expert policy as having a point mass distribution (100% probability assigned to the expert action) then the cross entropy and KL divergence coincide (since log(1) = 0).
@Lucaweihs , Btw, it seems that the link to the Particle Environment Experiments is inaccessible. Will you guys open source it? If not, I just want to know how did you guys do to calculate the distance for continuous action spaces?
Hi @hai-h-nguyen, @unnat ran those experiments so I'll let him clarify (if I remember correctly it was based on the L2 distance, i.e. the KL divergence between gaussians).
Thanks @hai-h-nguyen for interest in our work. The MPE repo is public now. Thanks for pointing that out.
Hi, any reason why you guys use the cross entropy (log_prob evaluated at the expert_action) as the distance instead of [using proper KL divergence] (as specified in the paper) to calculate the weights?
https://github.com/allenai/advisor/blob/main/advisor_losses.py#L255