Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Zero-filling tensors to reset state buffer #47

Closed ThisIsIsaac closed 5 years ago

ThisIsIsaac commented 5 years ago

In env.py, you clear state_buffer by enqueuing empty zero-filled tensors:

for _ in range(self.window):
      self.state_buffer.append(torch.zeros(84, 84, device=self.device))

I see how this would work if we are dealing with an environment where the state is a screen and each value in the tensor is an RGB value. However, can zero-filling a tensor to indicate the non-existance of state be generalized to other environments? For example, if each value in the tensor means "waiting time of a customer" then, would this approach also work?

Kaixhin commented 5 years ago

This may/may not work - you'd have to devise an appropriate "null" value for your domain (maybe -1 makes more sense than 0 for wait time, I don't know), though in the end you may just have to try a few options and see what works best empirically.