The bug is caused by one dimension of the state never changing, leading to a standard deviation of 0 when normalizing the policy. The policy now trains without error.
There seems like there may be an issue with running the learned policy, which should be looked into. Fixing https://github.com/cbfinn/gps/issues/24 is a priority though.
Fix to the bug mentioned in: https://github.com/cbfinn/gps/issues/24
The bug is caused by one dimension of the state never changing, leading to a standard deviation of 0 when normalizing the policy. The policy now trains without error.
There seems like there may be an issue with running the learned policy, which should be looked into. Fixing https://github.com/cbfinn/gps/issues/24 is a priority though.