Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.
But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).
The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.
Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:
def _handle_start_index_edge_cases(self, start_index, end_index):
# Edge case 1:
if not self._buffer_full() and start_index < 0:
start_index = 0
# Edge case 2:
# Handle the case where start index crosses the buffer head pointer - the data before and after the head pointer
# belongs to completely different episodes
if self._buffer_full():
if 0 < (self.current_free_slot_index - start_index) % self.max_buffer_size < self.num_previous_frames_to_fetch:
start_index = self.current_free_slot_index
where my num_previous_frames_to_fetch is your frame_history_len.
Hi!
Looking at the _encode_observation function it seems you have a subtle bug in there.
Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.
But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).
The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.
Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:
where my
num_previous_frames_to_fetch
is yourframe_history_len
.