Farama-Foundation / SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Other
441 stars 56 forks source link

fix: have access to `terminal_observation` in the infos. #233

Closed KaleabTessera closed 8 months ago

KaleabTessera commented 9 months ago

Closes https://github.com/Farama-Foundation/SuperSuit/issues/232.

This allows access to terminal_observation, which previously wasn't possible when using pettingzoo_env_to_vec_env_v1/MarkovVectorEnv.

KaleabTessera commented 9 months ago

@elliottower Do you mind having a look at this? This is quite important since if you don't have the correct terminal obs, you can't do bootstrapping correctly.

elliottower commented 9 months ago

Hey @KaleabTessera sorry about this, will take a look now. Just approved the ci workflows

elliottower commented 9 months ago

Going to get confirmation from another Farama dev who has more experience with this type of thing than me cause I’m not 100% confident in my ability to check correctness for this

elliottower commented 9 months ago

I see a few errors in the CI but I'm not sure they are anything to do with the changes you've made, testing it locally myself just to confirm and get a better idea (may be something with pettingzoo instead not sure)

elliottower commented 9 months ago

Fixed the bug on PettingZoo's end, queued up the CI to run (will take a little while as the PZ CI is running as well), I think it should all pass now

elliottower commented 9 months ago

Oh right I need to do a pettingzoo release for this to pass, this will have to wait for a few days as we are waiting on the AgileRL tutorial bugs to be fixed

elliottower commented 8 months ago

PZ release is out in case you didn't see, so this should be unblocked now. Approved the workflows just now

KaleabTessera commented 8 months ago

Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?

A similar thing is done in stable baselines' vec env

elliottower commented 8 months ago

That sounds reasonable to me, going to get input from another dev to see if it makes sense to them

ffelten commented 8 months ago

Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?

A similar thing is done in stable baselines' vec env

Hi,

I don't think it makes sense in this case since the underlying env (the parallel PZ env) is only a single entity.

elliottower commented 8 months ago

Jet (@jjshoots) also said he thought this shouldn't be implemented, here's a screenshot of what he said.

Discord_6EwUEsGCck

KaleabTessera commented 8 months ago

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.

I still think there is a possible issue for seeding in this scenario :

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

ffelten commented 8 months ago

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.

I still think there is a possible issue for seeding in this scenario :

  • You pass a seed into the environment when calling the first env.reset(.
  • Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

Normally the environment should handle this: the first reset() is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset() with seed being None, they are still seeded by the first call.

jjshoots commented 8 months ago

I agree to what @ffelten is saying. That was what I was getting at in the first place but Florian put better words to it.

KaleabTessera commented 8 months ago

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful. I still think there is a possible issue for seeding in this scenario :

  • You pass a seed into the environment when calling the first env.reset(.
  • Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

Normally the environment should handle this: the first reset() is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset() with seed being None, they are still seeded by the first call.

This makes sense, thanks @ffelten @jjshoots ! I double-checked the base env I was using and this was the case :+1: So likely this is not an issue if base env handles seeding reasonably.

elliottower commented 8 months ago

Thanks for the input guys. Looks like there's still a pytest failure which I'm not 100% sure why is the case

KaleabTessera commented 8 months ago

Looks like the error is thrown by pettingzoo's api_test function here. knights_archers_zombies_v10 is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).

The tests passed locally when I ran them. Screenshot from 2023-11-27 17-25-03

My package versions, p3.10 + :

cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6

I also downgraded my numpy to 1.24.4 and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10 env or sticky actions wrapper that is not getting seeded deterministically.

I don't think it is related to this PR, since the test_sticky_actions test doesn't use the vec env in the test.

elliottower commented 8 months ago

Looks like the error is thrown by pettingzoo's api_test function here. knights_archers_zombies_v10 is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).

The tests passed locally when I ran them. Screenshot from 2023-11-27 17-25-03

My package versions, p3.10 + :

cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6

I also downgraded my numpy to 1.24.4 and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10 env or sticky actions wrapper that is not getting seeded deterministically.

I don't think it is related to this PR, since the test_sticky_actions test doesn't use the vec env in the test.

Thanks for looking into this, I'll try re-running the tests to see if it works. Weird that it passed on one python version but not the other as well.

elliottower commented 8 months ago

Looks like it passed when I re-ran it, going to re-run on python 3.9 to see but yeah it's unrelated to this PR so not a big deal