Mean waiting time of an episode suddenly drops really low

I was training a variant of the system developed in this project where I have 2 separate Traffic Lights and separate states and rewards for each of the agents. I have set N_EPOCHS to 100 and increased no of episodes to 300. After about 60 episodes the mean waiting time of both of the agents drops drastically, from around (-2000) - (-3000) range to -120000 which is really weird. It also stopped improving and i don't see any convergence in the future. I wanted to know some possible causes for this drop in performance. I noticed the vehicles started teleporting (because of waiting too long ) exactly after the 61st episode which seems suspicious.

rl_train

AndreaVidali / Deep-QLearning-Agent-for-Traffic-Signal-Control

Mean waiting time of an episode suddenly drops really low #36