fix a bug in observation generation in MemoryChain-bsuite

RobertTLange / gymnax

RL Environments in JAX 🌍

Apache License 2.0

577 stars 54 forks source link

fix a bug in observation generation in MemoryChain-bsuite #36

Closed qlan3 closed 1 year ago

qlan3 commented 1 year ago

The generated observation in step_env function is for the current state ,but not the next state. https://github.com/RobertTLange/gymnax/blob/main/gymnax/environments/bsuite/memory_chain.py#L44

RobertTLange commented 1 year ago

Hi @qlan3 - thank for putting in a PR. I followed the bsuite implementation and it does also generate the observation before applying the transition. Have a look here: https://github.com/deepmind/bsuite/blob/master/bsuite/environments/memory_chain.py

I believe this is correct. Have you checked if your changes pass the tests?

qlan3 commented 1 year ago

You mean the workflows? I need your approve to run workflows.

RobertTLange commented 1 year ago

I recommend running the tests locally before submitting a PR. This way you can be sure that your changes didn’t damage anything. The observation function is correctly called before the transition.

qlan3 commented 1 year ago

I only changed one line. Looking into the test log, it is due to an assertion error which is within expectation since the bsuite generates the observation before applying the transition. See: https://github.com/RobertTLange/gymnax/actions/runs/3067044315/jobs/4965647316#step:6:233

Moreover, I didn't touch any other games, but there are still errors for some other games: https://github.com/RobertTLange/gymnax/actions/runs/3067044315/jobs/4965647316#step:6:531

RobertTLange commented 1 year ago

The test of MemoryChain breaks because your change in the observation call does not align with the original bsuite implementation. It is correct to call the get_observation function before executing the changes in the state. This is simply how the MemoryChain environment is implemented (see link provided above).

The other tests fail because the original gym API has unfortunately changed to:

observation, reward, terminated, truncated, info = env.step(action)

and I haven't gotten to update the unit tests yet (or freeze the version of gym against which we test).

qlan3 commented 1 year ago

Cool, I will run the tests again after you update the tests.

DriesSmit commented 1 year ago

Hey there. Is this environment still broken in main?

RobertTLange commented 1 year ago

Hi @DriesSmit, As I was trying to say in my previous post -- there is no bug in the observation for MemoryChain-bsuite and hence the environment in main is correct. The step transition is exactly as in the original bsuite implementation. Just compare:

I will close this PR to avoid future confusion.