NVlabs / cule

CuLE: A CUDA port of the Atari Learning Environment (ALE)
BSD 3-Clause "New" or "Revised" License
232 stars 35 forks source link

Fix unnecessary memory allocation in DQN example #22

Closed AlexanderDzhoganov closed 3 years ago

AlexanderDzhoganov commented 4 years ago

Using PyTorch 1.5 on CUDA 10.2 this call in memory.py

self.states_view.set_(self.observations.storage(),
  storage_offset=0,
  size=torch.Size([self.num_ales, num_steps, self.history, width, height]),
  stride=(stepsize, imagesize, imagesize, width, 1))

allocates memory and releases it immediately. This causes a CUDA out of memory error if you can't store double the amount of samples that memory_capacity is set to.

Changed to use torch.as_strided instead which does no allocation.

Also due to improved bounds checking in as_strided the change uncovered an error where the second dimension of states_view should be num_steps - (self.history - 1) instead of num_steps.

Same applies to frame_view and reward_view.

sdalton1 commented 3 years ago

Good catch, thanks for the fix.