I just noticed that the code is using TruncNormal as the actor distribution instead of TanhNormal as in v1. I wonder did you make some ablations on these two choices and see TruncNormal provide better results? Or the change is only because the entropy of TruncNormal is easier to compute than TanhNormal for the entropy regularizer?
Hey @danijar.
I just noticed that the code is using
TruncNormal
as the actor distribution instead ofTanhNormal
as in v1. I wonder did you make some ablations on these two choices and seeTruncNormal
provide better results? Or the change is only because the entropy ofTruncNormal
is easier to compute thanTanhNormal
for the entropy regularizer?