Hi there, I have observed regular "spikes" when plotting the training curve (wait time) of MPLight and DQN every 10 episodes. I think this pattern is also observable in some scenarios in your paper. Just want to check if you have some clues if it's the nature of the algorithm/implementation or it can be diminished by changing some parameters/workflow. Thanks a lot!
Hi there, I have observed regular "spikes" when plotting the training curve (wait time) of MPLight and DQN every 10 episodes. I think this pattern is also observable in some scenarios in your paper. Just want to check if you have some clues if it's the nature of the algorithm/implementation or it can be diminished by changing some parameters/workflow. Thanks a lot!