I have observed that modifying the optimal return parameter of the DiscountingChain bsuite environment has no effect. The author of the code consistently used the specific value of 1.1 in place of the variable name optimal_return throughout the code.
To Reproduce
Run the following sample code.
import gymnax
import jax
from gymnax import environments as envs
rng = jax.random.PRNGKey(0)
env, env_params = gymnax.make("DiscountingChain-bsuite")
rng, key_reset= jax.random.split(rng, 2)
# Attempting to change the optimal_return parameter.
params_setting = {
"optimal_return" : 1.2
}
env_params=envs.bsuite.discounting_chain.EnvParams(**params_setting)
print("The optimal return has now changed \
as can been seen here:\n\n",env_params)
# Reset the environment.
obs, state = env.reset(key_reset, env_params)
print()
print("However after printing the state rewards we can\
still see that the optimal return is still 1.1:")
print(state.rewards)
Expected behaviour
The optimal return in state.rewards should be 1.2.
Describe the bug
I have observed that modifying the optimal return parameter of the DiscountingChain bsuite environment has no effect. The author of the code consistently used the specific value of 1.1 in place of the variable name
optimal_return
throughout the code.To Reproduce
Expected behaviour
The optimal return in
state.rewards
should be 1.2.Actual behaviour
The optimal return in
state.rewards
is still 1.1.