Pi-Star-Lab / RESCO

Reinforcement Learning Benchmarks for Traffic Signal Control (RESCO)
120 stars 37 forks source link

"Spikes" during training MPLight and DQN #19

Open RealZYZhang opened 1 year ago

RealZYZhang commented 1 year ago

Hi there, I have observed regular "spikes" when plotting the training curve (wait time) of MPLight and DQN every 10 episodes. I think this pattern is also observable in some scenarios in your paper. Just want to check if you have some clues if it's the nature of the algorithm/implementation or it can be diminished by changing some parameters/workflow. Thanks a lot!

image