AndreaVidali / Deep-QLearning-Agent-for-Traffic-Signal-Control

A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.
MIT License
405 stars 146 forks source link

Does not use separate target network #4

Closed ThisIsIsaac closed 5 years ago

ThisIsIsaac commented 5 years ago

Question

Is there a reason why you did not implement a separate target network? Unless you have tried it and made a decision to not use separate target network, it is one of the two biggest improvements that the DQN paper suggested. I didn't see any explanation from your paper either. Would love to hear what you think. Thanks

Why separate network is important

Below is from the DQN paper published in Nature

Screen Shot 2019-06-26 at 21 09 11 PM

And the table it refers to:

Screen Shot 2019-06-26 at 21 10 26 PM
ThisIsIsaac commented 5 years ago

I have implemented a separate target network. Compared to 100 epochs on both the original & separate target network version, the latter display 30% increase in performance.

AndreaVidali commented 5 years ago

I have implemented a target network in my private work, following my thesis. Yes, I know that a TN gives a boost to performance, but the intent of this repo is to give a starting point to people that want to work on this topic. Therefore I tried to simplify as much as possible the overall work (which it still achieves good performance even without TN), and let others experiments and expand the work as they like.

ThisIsIsaac commented 5 years ago

You should rename the repo or explicitly mention that it is not a complete implementation of DQN. It is very misleading because no where in the code, readme, or your thesis is there any mention of your intent to only partially implement DQN.

AndreaVidali commented 5 years ago

As you stated, this is not an implementation of the DQN system proposed in the DQN paper. Here I have implemented my own very simple version of a deep q-learning system, which is not directly inspired by DQN, and that is why you can't see any mention of DQN. In this work, I have included only the parts that I found a good trade-off between performance and understandability. For example, I also would like to point out that DQN also uses a Convolutional NN, but in this work, a Feedforward NN is used.

You are free and encouraged to implement a vanilla DQN system applied to Traffic Control, but this is not what this repo is about. This repo is just to give a practical starting point to anyone that wants to dive in this topic using SUMO, also because when I started working on it I found out that there weren't any good resources online.

way-thu commented 10 months ago

@ThisIsIsaac sorry to bother u, can u share the code which implemented a separate target network? I want to use it for a better learning.

way-thu commented 10 months ago

@ThisIsIsaac thanks a lot!