CityBrainChallenge / KDDCup2021-CityBrainChallenge-starter-kit

75 stars 40 forks source link

Question About the Training Process in DQN example. #31

Open Kuangqi927 opened 3 years ago

Kuangqi927 commented 3 years ago
  1. I am curious about the meaning of "done". Does it indicates that no flow will pass the intersection any more or the delay of the intersection beyonds the thershold(=1.6)? c2aa27f43fb2b5b985356c84df85cc0

  2. I am also curious about the Bellman update equation. As Bellman update equation mentioned, the information about the end of a environment, or dones in gym module, is required for updating Q-network. However, it seems lacked in the shared code.
    4b476690edca638305caeec0054303a

Could you offer some advice about this problem?

john9636 commented 3 years ago

In my opinion, the end information is not indispensable. It depends on the design of your reward and Q value.

zhyliu00 commented 3 years ago

Thanks for your comment. 'dones' means whether the time stamp exceeds the 'max_time_epoch' in 'simulator.cfg'. The 'threshold' argument in 'evaluate.py' ( which equals 1.6 by default) decides the termination of evaluation. You can modify both arguments.