Closed Ziksby closed 1 year ago
Changed how self.reward is calculated. It is now based on the current value of the optimal_return parameter.
self.reward
optimal_return
This is so that if the optimal_return is changed, it will change the maximum reward the agent can obtain from a state.
Note that this change could potentially make the reset_envmethod slower, as it now calculates the rewards for the states at every reset.
reset_env
Close #39
What?
Changed how
self.reward
is calculated. It is now based on the current value of theoptimal_return
parameter.Why?
This is so that if the
optimal_return
is changed, it will change the maximum reward the agent can obtain from a state.Extra
Note that this change could potentially make the
reset_env
method slower, as it now calculates the rewards for the states at every reset.Close #39