RobertTLange / gymnax

RL Environments in JAX 🌍
Apache License 2.0
577 stars 54 forks source link

Update the maximum reward a state can have based on optimal_return parameter. #40

Closed Ziksby closed 1 year ago

Ziksby commented 1 year ago

What?

Changed how self.reward is calculated. It is now based on the current value of the optimal_return parameter.

Why?

This is so that if the optimal_return is changed, it will change the maximum reward the agent can obtain from a state.

Extra

Note that this change could potentially make the reset_envmethod slower, as it now calculates the rewards for the states at every reset.

Close #39