RobertTLange / gymnax

RL Environments in JAX 🌍
Apache License 2.0
577 stars 54 forks source link

Modifying optimal return parameter has no effect (bug) #39

Closed Ziksby closed 1 year ago

Ziksby commented 1 year ago

Describe the bug

I have observed that modifying the optimal return parameter of the DiscountingChain bsuite environment has no effect. The author of the code consistently used the specific value of 1.1 in place of the variable name optimal_return throughout the code.

To Reproduce

  1. Run the following sample code.
import gymnax
import jax
from gymnax import environments as envs
rng = jax.random.PRNGKey(0)
env, env_params = gymnax.make("DiscountingChain-bsuite")
rng, key_reset= jax.random.split(rng, 2)

# Attempting to change the optimal_return parameter.
params_setting = {
                "optimal_return" : 1.2
            }
env_params=envs.bsuite.discounting_chain.EnvParams(**params_setting)
print("The optimal return has now changed \
as can been seen here:\n\n",env_params)

# Reset the environment.
obs, state = env.reset(key_reset, env_params)
print()
print("However after printing the state rewards we can\
still see that the optimal return is still 1.1:")
print(state.rewards)

Expected behaviour

The optimal return in state.rewards should be 1.2.

Actual behaviour

The optimal return in state.rewards is still 1.1.