berkeleydeeprlcourse / homework

Assignments for CS294-112.
MIT License
1.57k stars 1.04k forks source link

ReplayBuffer - a subtle bug around head pointer boundary #52

Open gordicaleksa opened 3 years ago

gordicaleksa commented 3 years ago

Hi!

Looking at the _encode_observation function it seems you have a subtle bug in there.

Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.

But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).

The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.

Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:

    def _handle_start_index_edge_cases(self, start_index, end_index):
      # Edge case 1:
      if not self._buffer_full() and start_index < 0:
          start_index = 0

      # Edge case 2:
      # Handle the case where start index crosses the buffer head pointer - the data before and after the head pointer
      # belongs to completely different episodes
      if self._buffer_full():
          if 0 < (self.current_free_slot_index - start_index) % self.max_buffer_size < self.num_previous_frames_to_fetch:
              start_index = self.current_free_slot_index

where my num_previous_frames_to_fetch is your frame_history_len.