Open Kuangqi927 opened 3 years ago
In my opinion, the end information is not indispensable. It depends on the design of your reward and Q value.
Thanks for your comment. 'dones' means whether the time stamp exceeds the 'max_time_epoch' in 'simulator.cfg'. The 'threshold' argument in 'evaluate.py' ( which equals 1.6 by default) decides the termination of evaluation. You can modify both arguments.
I am curious about the meaning of "done". Does it indicates that no flow will pass the intersection any more or the delay of the intersection beyonds the thershold(=1.6)?
I am also curious about the Bellman update equation. As Bellman update equation mentioned, the information about the end of a environment, or dones in gym module, is required for updating Q-network. However, it seems lacked in the shared code.
Could you offer some advice about this problem?