Closed ynuwm closed 4 years ago
I feel you. This is a little bit tricky and now I would like to modify the code to improve the readability because it surely seems confusing.
I will try to clarify this point by analyzing the code step by step.
In the function
def _save_stats(self, tot_neg_reward):
self._reward_store.append(tot_neg_reward) # how much negative reward in this episode
self._cumulative_wait_store.append(self._sum_intersection_queue) # total number of seconds waited by cars in this episode
self._avg_intersection_queue_store.append(self._sum_intersection_queue / self._max_steps)
that you reported, the self._cumulative_wait_store
is updated. This is a list which is long equal to the number of episodes. In every element of the list, the total number of seconds waited by all cars in an episode is stored. This amount is stored by the variable
self._sum_intersection_queue
which is the confusing part. But let's see where the value of this variable is changed.
On lines 95:100, inside the _simulate
function, we have
while steps_todo > 0:
traci.simulationStep() # simulate 1 step in sumo
self._replay() # training
steps_todo -= 1
intersection_queue = self._get_stats()
self._sum_intersection_queue += intersection_queue
which is the heart of the simulation, since 1 loop of this while corresponds to 1 simulation step, as you can see by traci.simulationStep()
.
Within this loop, the last instruction modifies the variable we are investigating with the variable intersection_queue
, which is accumulated. One line up, we obtain the value of intersection_queue
with the function self._get_stats()
which retrieves how many cars have the speed of < 0.5 m/s (they are in a stopped state). In other words, how many cars are in the queue in that step.
Here is the key. The value we retrieve, which is how many cars are in the queue in a step also is equal to the number of seconds that all cars waited during that step.
Let's make a quick example. Suppose that in step 354 the number of cars in the queue is 63. Now you are asking yourself: how many seconds in total every car waited during this step?
Since the step in SUMO is 1 second long, is easy to say that during 1 step, a single car that is stopped will wait exactly 1 second. Nothing more nothing less. Therefore, if we had 63 cars waiting in a particular step, they will wait 63 seconds during that step.
So, coming back to the code, 63 will be the value of intersection_queue
which will be accumulated in self._sum_intersection_queue
. Doing this every step, at the end of the episode we will obtain both these values:
The point is that both these values are stored by the same variable, which is self._sum_intersection_queue
and I completely understand that can be confusing if you don't dive deep into the code, and that is why I would like to make an update to this part of the code.
I hope that now you have less confusion about this. Let me know.
hello,I have a question about the negative rewards. In your code, you define it as old - new. It means if old wait bigger than new, it will be positive. We expect the reward will be more and more biger. So, whether it should be positive ? tot_neg_reward += reward It means 1s - 2s + 2s -3s ...,right? So, it means the earliest wait time - the latest time,right?
@zzcNEU At first, I had the same questions as you, but later I figured it out. The meaning of the reward is the difference between the number of vehicles waiting at the intersection at the beginning of action and end of action, ie. reward = Num(Time_atcion_start) - Num(Time_action_end). And reward update after every action duration (10s in this repository) is ended, so if the action is good, than the reward update with a positive value, and if the action is bad, the reward update with a negative value. While at the beginning, the model is not trained so well, and the action need to random choose according the parameter eps(in SimRunner.py) , so the reward is always the negative value. As the model is trained and becomes better, the value of reward keeps getting bigger, that is, its absolute value gets smaller.
Thanks for your reply, I got your meaning. And I have another question, have you ever try to tune the params on this code. I just find it is hard to tune it well. When i try to set the gamma as 0.9,the code performence will decrease.
---Original--- From: "ynuwm"<notifications@github.com> Date: Tue, Nov 26, 2019 22:40 PM To: "AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control"<Deep-QLearning-Agent-for-Traffic-Signal-Control@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"zzcNEU"<947660652@qq.com>; Subject: Re: [AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control] I feel confused about this cumulative_wait_store (#6)
@zzcNEU At first, I had the same questions as you, but later I figured it out. The meaning of the reward is the difference between the number of vehicles waiting at the intersection at the beginning of action and end of action, ie. reward = Num(Time_atcion_start) - Num(Time_action_end). And reward update after every action duration (10s in this repository) is ended, so if the action is good, than the reward update with a positive value, and if the action is bad, the reward update with a negative value. While at the beginning, the model is not trained so well, and the action need to random choose according the parameter eps(in SimRunner.py) , so the reward is always the negative value. As the model is trained and becomes better, the value of reward keeps getting bigger, that is, its absolute value gets smaller.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Yeah, I also found when the gamma parameter is set to 0.9 we can get the best result. And other parameters seems hard to tune, my suggestion is changing the traffic size more than 1000, so the model maybe not performe well as the traditional method such as fixed time traffic signal control, that's another topic we need to pay more attention and do more research on it.
谢谢你的回复!!!
---Original--- From: "ynuwm"<notifications@github.com> Date: Tue, Nov 26, 2019 23:31 PM To: "AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control"<Deep-QLearning-Agent-for-Traffic-Signal-Control@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"zzcNEU"<947660652@qq.com>; Subject: Re: [AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control] I feel confused about this cumulative_wait_store (#6)
Yeah, I also found when the gamma parameter is set to 0.9 we can get the best result. And other parameters seems hard to tune, my suggestion is changing the traffic size more than 1000, so the model maybe not performe well as the traditional method such as fixed time traffic signal control, that's another topic we need to pay more attention and do more research on it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
In SimRunner.py this function:
in my understanding, cumulative delay should collect the wait time instead of the self._sum_intersection_queue, so if I want to compute average wait time, should I use this:
right?