Open ConstantinRuhdorfer opened 1 year ago
Hi, I can confirm that simply changing line 80 in multi_step
in overcookedgym/overcooked.py
from:
return (ego_obs, alt_obs), (reward, reward), done, {}#info
to this
return (ego_obs, alt_obs), (reward, reward), [done], {}#info
fixes the issue and still works with OnPolicyAgent
and PPO
. I will open up a PR, can you maybe comment if this has any other implications? Thanks
PR is here #14
Hi,
I adapted the simple example to use
Just to test
OffPolicyAgent
but I keep getting:This seems to be due to the fact that SB3 is expecting multiple
dones
fromenv.step
instable_baselines3/common/off_policy_algorithm.py:544
:new_obs, rewards, dones, infos = env.step(actions)
where Overcooked only returns a singledone
inovercookedgym/overcooked.py:80
.Are off policy algorithms not supported? Is there a good way of fixing this, i.e. by changing line 80 from
to
?
Thank you!
Cheers, Constantin