danijar / dreamerv3

Mastering Diverse Domains through World Models
https://danijar.com/dreamerv3
MIT License
1.28k stars 219 forks source link

Need some clarifications about details in Atari env #118

Closed swsychen closed 5 months ago

swsychen commented 5 months ago

Hi, in the embodied/envs/atari.py, I don't get the intention of some parts of the code:

  1. in def _reset(self): for i, dst in enumerate(self.buffers): if i > 0: np.copyto(self.buffers[0], dst)

What's the purpose of doing the np.copyto? is it some sort of resetting of the image buffer?

  1. in def _obs(self, reward, is_first=False, is_last=False, is_terminal=False): if self.aggregate == 'max': image = np.amax(self.buffers, 0) elif self.aggregate == 'mean': image = np.mean(self.buffers, 0).astype(np.uint8)

Why we need to do np.amax or np.amean across the image buffer? Or is there some potential issues if we just use the self.buffers[0]?

Could you clarify a bit regarding these points? Thank you very much.

danijar commented 5 months ago

This is standard Atari preprocessing. Some games render half their objects during even frames and the other half during odd frames, giving the illusion of a full image to the human player while reducing computational cost for the old Atari 2600 console. Taking mean or max over the last two frames ensures the agent sees a complete image. It should still work without this because Dreamer has memory, but the results would not be as comparable to the prior literature.