Out of all environments, Pendulum-1, MountainCarContinuous-v0 and Reacher-misc return a jax array with shape (1, ) as a reward. This is inconsistent with all other environments, which return an array with shape (). This can lead to unexpected shaping errors, for example consider a case like this
If the reward returned by the environment has shape (1, ), the the result of vmapping will have shape (3, 1) instead of (3, ), and therefore weighted_rewards will have shape (3, 3) instead of (3, ).
Out of all environments, Pendulum-1, MountainCarContinuous-v0 and Reacher-misc return a jax array with shape
(1, )
as a reward. This is inconsistent with all other environments, which return an array with shape()
. This can lead to unexpected shaping errors, for example consider a case like thisIf the reward returned by the environment has shape
(1, )
, the the result of vmapping will have shape(3, 1)
instead of(3, )
, and thereforeweighted_rewards
will have shape(3, 3)
instead of(3, )
.