Closed KaleabTessera closed 8 months ago
@elliottower Do you mind having a look at this? This is quite important since if you don't have the correct terminal obs, you can't do bootstrapping correctly.
Hey @KaleabTessera sorry about this, will take a look now. Just approved the ci workflows
Going to get confirmation from another Farama dev who has more experience with this type of thing than me cause I’m not 100% confident in my ability to check correctness for this
I see a few errors in the CI but I'm not sure they are anything to do with the changes you've made, testing it locally myself just to confirm and get a better idea (may be something with pettingzoo instead not sure)
Fixed the bug on PettingZoo's end, queued up the CI to run (will take a little while as the PZ CI is running as well), I think it should all pass now
Oh right I need to do a pettingzoo release for this to pass, this will have to wait for a few days as we are waiting on the AgileRL tutorial bugs to be fixed
PZ release is out in case you didn't see, so this should be unblocked now. Approved the workflows just now
Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?
A similar thing is done in stable baselines' vec env
That sounds reasonable to me, going to get input from another dev to see if it makes sense to them
Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?
A similar thing is done in stable baselines' vec env
Hi,
I don't think it makes sense in this case since the underlying env (the parallel PZ env) is only a single entity.
Jet (@jjshoots) also said he thought this shouldn't be implemented, here's a screenshot of what he said.
So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.
I still think there is a possible issue for seeding in this scenario :
env.reset(
. env.reset
here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset
from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds. I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.
So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.
I still think there is a possible issue for seeding in this scenario :
- You pass a seed into the environment when calling the first
env.reset(
.- Then the env loops itself, and manually calls
env.reset
here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when callingreset
from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.
Normally the environment should handle this: the first reset()
is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset()
with seed being None
, they are still seeded by the first call.
I agree to what @ffelten is saying. That was what I was getting at in the first place but Florian put better words to it.
So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful. I still think there is a possible issue for seeding in this scenario :
- You pass a seed into the environment when calling the first
env.reset(
.- Then the env loops itself, and manually calls
env.reset
here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when callingreset
from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.
Normally the environment should handle this: the first
reset()
is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls toreset()
with seed beingNone
, they are still seeded by the first call.
This makes sense, thanks @ffelten @jjshoots ! I double-checked the base env I was using and this was the case :+1: So likely this is not an issue if base env handles seeding reasonably.
Thanks for the input guys. Looks like there's still a pytest failure which I'm not 100% sure why is the case
Looks like the error is thrown by pettingzoo's api_test
function here. knights_archers_zombies_v10
is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).
The tests passed locally when I ran them.
My package versions, p3.10 + :
cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6
I also downgraded my numpy to 1.24.4
and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10
env or sticky actions wrapper that is not getting seeded deterministically.
I don't think it is related to this PR, since the test_sticky_actions
test doesn't use the vec env in the test.
Looks like the error is thrown by pettingzoo's
api_test
function here.knights_archers_zombies_v10
is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).The tests passed locally when I ran them.
My package versions, p3.10 + :
cffi==1.16.0 cfgv==3.4.0 cloudpickle==3.0.0 distlib==0.3.7 exceptiongroup==1.1.3 Farama-Notifications==0.0.4 filelock==3.13.0 gymnasium==0.29.1 identify==2.5.31 iniconfig==2.0.0 nodeenv==1.8.0 numpy==1.26.1 packaging==23.2 pettingzoo==1.24.2 platformdirs==3.11.0 pluggy==1.3.0 pre-commit==3.5.0 pycparser==2.21 pygame==2.5.2 pymunk==6.6.0 pytest==7.4.3 PyYAML==6.0.1 tinyscaler==1.2.7 tomli==2.0.1 typing_extensions==4.8.0 virtualenv==20.24.6
I also downgraded my numpy to
1.24.4
and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour inknights_archers_zombies_v10
env or sticky actions wrapper that is not getting seeded deterministically.I don't think it is related to this PR, since the
test_sticky_actions
test doesn't use the vec env in the test.
Thanks for looking into this, I'll try re-running the tests to see if it works. Weird that it passed on one python version but not the other as well.
Looks like it passed when I re-ran it, going to re-run on python 3.9 to see but yeah it's unrelated to this PR so not a big deal
Closes https://github.com/Farama-Foundation/SuperSuit/issues/232.
This allows access to
terminal_observation
, which previously wasn't possible when using pettingzoo_env_to_vec_env_v1/MarkovVectorEnv.