Closed grig-guz closed 2 years ago
@grig-guz You can try version 1.4.1.
For now you can add
self._agent_ids = [
PLAYER_STR_FORMAT.format(index=index)
for index in range(self._num_players)
]
to the end of MeltingPotEnv.init() in examples/rllib/multiagent_wrapper.py.
@Muff2n that worked thanks!
Could you please turn this into a pull request so that it is fixed for everyone?
This is added as part of PR 25 (if you are happy to have one PR address two issues). Though that is only to get the code to run, it does not address the 'specify the Ray version'.
It would be preferable to split single PR for single change, please :)
@Muff2n
I tried both running your proposed solution and the code that you submited in the pull request and I got the following error:
KeyError: 'player_0'
Do you know why could this happen?
I think that the issue is associated with the new Ray version that was published today.
I don’t have my computer close to try to the code with the previous Ray version. However, will check this hypothesis ASAP.
@Muff2n Hello, I managed to get the code working by installing pip install ray==1.11.0
.
However, I wonder how would this affect your PR.
Thank you for the heads up. It is a problem with ray==1.12.0. Taking a look at rllib it seems to have some bugs in it.
While I can give the new-style _agent_ids with:
self._agent_ids = set(
PLAYER_STR_FORMAT.format(index=index)
for index in range(self._num_players)
)
super().__init__()
There are other issues.
For example in ray/rllib/utils/pre_checks/env.py lines 333-341:
def _check_reward(reward, base_env=False, agent_ids=None):
if base_env:
for _, multi_agent_dict in reward.items():
for agent_id, rew in multi_agent_dict.items():
if not (
np.isreal(rew) and not isinstance(rew, bool) and np.isscalar(rew)
):
error = (
"Your step function must return rewards that are"
f" integer or float. reward: {rew}. Instead it was a "
f"{type(reward)}"
)
raise ValueError(error)
Here they test that the rewards returned are real, not boolean and scalar. That trips out for me because meltingpot is passing float rewards. The error message says that floats are acceptable (which I think they should be) and quotes the type of the reward dictionary rather than the type of the reward that they are testing.
Therefore I think it best to stick with ray=1.11.0 for now. I can look more closely and submit the PR next week.
Hi, Can you please specify the Ray version under which the
rllib
example code runs? I am currently getting this error with Ray1.11.0
: