Question About the Training Process in DQN example.

CityBrainChallenge / KDDCup2021-CityBrainChallenge-starter-kit

75 stars 40 forks source link

Question About the Training Process in DQN example. #31

Open Kuangqi927 opened 3 years ago

Kuangqi927 commented 3 years ago

I am curious about the meaning of "done". Does it indicates that no flow will pass the intersection any more or the delay of the intersection beyonds the thershold(=1.6)?
I am also curious about the Bellman update equation. As Bellman update equation mentioned, the information about the end of a environment, or dones in gym module, is required for updating Q-network. However, it seems lacked in the shared code.

Could you offer some advice about this problem?

john9636 commented 3 years ago

In my opinion, the end information is not indispensable. It depends on the design of your reward and Q value.

zhyliu00 commented 3 years ago

Thanks for your comment. 'dones' means whether the time stamp exceeds the 'max_time_epoch' in 'simulator.cfg'. The 'threshold' argument in 'evaluate.py' ( which equals 1.6 by default) decides the termination of evaluation. You can modify both arguments.