Closed qlan3 closed 1 year ago
Hi @qlan3 - thank for putting in a PR. I followed the bsuite implementation and it does also generate the observation before applying the transition. Have a look here: https://github.com/deepmind/bsuite/blob/master/bsuite/environments/memory_chain.py
I believe this is correct. Have you checked if your changes pass the tests?
You mean the workflows? I need your approve to run workflows.
I recommend running the tests locally before submitting a PR. This way you can be sure that your changes didn’t damage anything. The observation function is correctly called before the transition.
I only changed one line. Looking into the test log, it is due to an assertion error which is within expectation since the bsuite generates the observation before applying the transition. See: https://github.com/RobertTLange/gymnax/actions/runs/3067044315/jobs/4965647316#step:6:233
Moreover, I didn't touch any other games, but there are still errors for some other games: https://github.com/RobertTLange/gymnax/actions/runs/3067044315/jobs/4965647316#step:6:531
The test of MemoryChain breaks because your change in the observation call does not align with the original bsuite implementation. It is correct to call the get_observation
function before executing the changes in the state
. This is simply how the MemoryChain environment is implemented (see link provided above).
The other tests fail because the original gym API has unfortunately changed to:
observation, reward, terminated, truncated, info = env.step(action)
and I haven't gotten to update the unit tests yet (or freeze the version of gym against which we test).
Cool, I will run the tests again after you update the tests.
Hey there. Is this environment still broken in main
?
Hi @DriesSmit, As I was trying to say in my previous post -- there is no bug in the observation for MemoryChain-bsuite
and hence the environment in main
is correct. The step transition is exactly as in the original bsuite
implementation. Just compare:
I will close this PR to avoid future confusion.
The generated observation in step_env function is for the current state ,but not the next state. https://github.com/RobertTLange/gymnax/blob/main/gymnax/environments/bsuite/memory_chain.py#L44