AndreaVidali / Deep-QLearning-Agent-for-Traffic-Signal-Control

A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.
MIT License
405 stars 146 forks source link

I feel confused about this cumulative_wait_store #6

Closed ynuwm closed 4 years ago

ynuwm commented 5 years ago

In SimRunner.py this function:

def _save_stats(self, tot_neg_reward):    
    self._reward_store.append(tot_neg_reward)  # how much negative reward in this episode
    self._cumulative_wait_store.append(self._sum_intersection_queue)  # total number of seconds waited by cars in this episode
    self._avg_intersection_queue_store.append(self._sum_intersection_queue / self._max_steps)

in my understanding, cumulative delay should collect the wait time instead of the self._sum_intersection_queue, so if I want to compute average wait time, should I use this:

self.average_wait_time = self._sum_intersection_queue / total_num_vehicle  (in this episode) 

right?

AndreaVidali commented 5 years ago

I feel you. This is a little bit tricky and now I would like to modify the code to improve the readability because it surely seems confusing.

I will try to clarify this point by analyzing the code step by step.

In the function

def _save_stats(self, tot_neg_reward):    
    self._reward_store.append(tot_neg_reward)  # how much negative reward in this episode
    self._cumulative_wait_store.append(self._sum_intersection_queue)  # total number of seconds waited by cars in this episode
    self._avg_intersection_queue_store.append(self._sum_intersection_queue / self._max_steps)

that you reported, the self._cumulative_wait_store is updated. This is a list which is long equal to the number of episodes. In every element of the list, the total number of seconds waited by all cars in an episode is stored. This amount is stored by the variable

self._sum_intersection_queue

which is the confusing part. But let's see where the value of this variable is changed. On lines 95:100, inside the _simulate function, we have

while steps_todo > 0:
            traci.simulationStep()  # simulate 1 step in sumo
            self._replay()  # training
            steps_todo -= 1
            intersection_queue = self._get_stats()
            self._sum_intersection_queue += intersection_queue

which is the heart of the simulation, since 1 loop of this while corresponds to 1 simulation step, as you can see by traci.simulationStep(). Within this loop, the last instruction modifies the variable we are investigating with the variable intersection_queue, which is accumulated. One line up, we obtain the value of intersection_queue with the function self._get_stats() which retrieves how many cars have the speed of < 0.5 m/s (they are in a stopped state). In other words, how many cars are in the queue in that step.

Here is the key. The value we retrieve, which is how many cars are in the queue in a step also is equal to the number of seconds that all cars waited during that step.

Let's make a quick example. Suppose that in step 354 the number of cars in the queue is 63. Now you are asking yourself: how many seconds in total every car waited during this step?

Since the step in SUMO is 1 second long, is easy to say that during 1 step, a single car that is stopped will wait exactly 1 second. Nothing more nothing less. Therefore, if we had 63 cars waiting in a particular step, they will wait 63 seconds during that step.

So, coming back to the code, 63 will be the value of intersection_queue which will be accumulated in self._sum_intersection_queue. Doing this every step, at the end of the episode we will obtain both these values:

  1. The number of total seconds waited by all cars in the episode (which could be used as it is, or you can divide it by the total number of cars in the episode to retrieve the average wait per car as you stated)
  2. The total number of queued cars summed for every step, which I used to calculate the average number of queued cars per step by simply dividing it by the total number of cars in the episode.

The point is that both these values are stored by the same variable, which is self._sum_intersection_queue and I completely understand that can be confusing if you don't dive deep into the code, and that is why I would like to make an update to this part of the code.

I hope that now you have less confusion about this. Let me know.

zzchuman commented 4 years ago

hello,I have a question about the negative rewards. In your code, you define it as old - new. It means if old wait bigger than new, it will be positive. We expect the reward will be more and more biger. So, whether it should be positive ? tot_neg_reward += reward It means 1s - 2s + 2s -3s ...,right? So, it means the earliest wait time - the latest time,right?

ynuwm commented 4 years ago

@zzcNEU At first, I had the same questions as you, but later I figured it out. The meaning of the reward is the difference between the number of vehicles waiting at the intersection at the beginning of action and end of action, ie. reward = Num(Time_atcion_start) - Num(Time_action_end). And reward update after every action duration (10s in this repository) is ended, so if the action is good, than the reward update with a positive value, and if the action is bad, the reward update with a negative value. While at the beginning, the model is not trained so well, and the action need to random choose according the parameter eps(in SimRunner.py) , so the reward is always the negative value. As the model is trained and becomes better, the value of reward keeps getting bigger, that is, its absolute value gets smaller.

zzchuman commented 4 years ago

Thanks for your reply, I got your meaning. And I  have another question, have you ever try to tune the params on this code. I just find it is hard to tune it well. When i try to set the gamma as 0.9,the code performence will decrease.

---Original--- From: "ynuwm"<notifications@github.com> Date: Tue, Nov 26, 2019 22:40 PM To: "AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control"<Deep-QLearning-Agent-for-Traffic-Signal-Control@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"zzcNEU"<947660652@qq.com>; Subject: Re: [AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control] I feel confused about this cumulative_wait_store (#6)

@zzcNEU At first, I had the same questions as you, but later I figured it out. The meaning of the reward is the difference between the number of vehicles waiting at the intersection at the beginning of action and end of action, ie. reward = Num(Time_atcion_start) - Num(Time_action_end). And reward update after every action duration (10s in this repository) is ended, so if the action is good, than the reward update with a positive value, and if the action is bad, the reward update with a negative value. While at the beginning, the model is not trained so well, and the action need to random choose according the parameter eps(in SimRunner.py) , so the reward is always the negative value. As the model is trained and becomes better, the value of reward keeps getting bigger, that is, its absolute value gets smaller.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ynuwm commented 4 years ago

Yeah, I also found when the gamma parameter is set to 0.9 we can get the best result. And other parameters seems hard to tune, my suggestion is changing the traffic size more than 1000, so the model maybe not performe well as the traditional method such as fixed time traffic signal control, that's another topic we need to pay more attention and do more research on it.

zzchuman commented 4 years ago

谢谢你的回复!!!

---Original--- From: "ynuwm"<notifications@github.com> Date: Tue, Nov 26, 2019 23:31 PM To: "AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control"<Deep-QLearning-Agent-for-Traffic-Signal-Control@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"zzcNEU"<947660652@qq.com>; Subject: Re: [AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control] I feel confused about this cumulative_wait_store (#6)

Yeah, I also found when the gamma parameter is set to 0.9 we can get the best result. And other parameters seems hard to tune, my suggestion is changing the traffic size more than 1000, so the model maybe not performe well as the traditional method such as fixed time traffic signal control, that's another topic we need to pay more attention and do more research on it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.