chandar-lab / RLHive

MIT License
100 stars 9 forks source link

Update compatibility with gym environment #344

Open delara38 opened 1 year ago

delara38 commented 1 year ago

Hi,

the gym environments now return the 5-tuple (next state, action, reward, terminate, truncate, info) instead of their previous 4-tuple setup; however, RLHive still expects their previous setup at each transition and needs to be ammended.

I believe that all that is needed is to the step function in gym_env.py from

    def step(self, action):
        observation, reward, done, info = self._env.step(action)
        self._turn = (self._turn + 1) % self._num_players
        return observation, reward, done, self._turn, info

to (assuming that the rest of RLHive will continue to only return a done boolean)

    def step(self, action):
        observation, reward, terminate, truncate, info = self._env.step(action)
        done = terminate or truncate
        self._turn = (self._turn + 1) % self._num_players
        return observation, reward, done, self._turn, info
dapatil211 commented 1 year ago

Hi @delara38,

Yes you are right. We have made these changes in the dev branch of the repo, and in fact are planning on making the return type of the environment a bit more structured with dataclasses. These changes will be integrated into the main branch and the next release in the next 2-3 weeks. For now, if you need the termination/truncation change, please use the dev branch.

delara38 commented 1 year ago

great thanks!