Closed RZ-Q closed 11 months ago
Our algorithm is based on dqn which chooses episodes from experience replay buffer randomly. Therefore, the length which randomly chosen from experience replay buffer are the same length as the episode length. If you see the equations from our paper, you would understand this concept easily.
I am confused about the block-wise operation, why the length is the same as the episode's length, and why to use this block-wise operation.