Fix to box2d example bug (causing NaNs in the policy & policy prior)

Fix to the bug mentioned in: https://github.com/cbfinn/gps/issues/24

The bug is caused by one dimension of the state never changing, leading to a standard deviation of 0 when normalizing the policy. The policy now trains without error.

There seems like there may be an issue with running the learned policy, which should be looked into. Fixing https://github.com/cbfinn/gps/issues/24 is a priority though.

cbfinn / gps

Fix to box2d example bug (causing NaNs in the policy & policy prior) #30