A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.
I found that it returns the accumulated waiting time [s] within the previous time interval of default length 100 s. (length is configurable per option --waiting-time-memory given to the main application).
you used this #def to get reward. I have a question, when I choose_action and simulate, the steps increase 10steps or 14steps. and, reward comes from the interval between the actions. Then when steps increase, can we get reward not only within time interval of the two actions but also previous actions? (because def returns accumulated waiting time within previous interval 100steps).
so, can we get more waiting time? like within(10(1st action)+10(2nd action)+14(3rd action) = 34steps).
what if, the current_steps over the 100step like (102steps). then will the 2steps (of first action) be ignored?
def _collect_waiting_times in training_simulation.py there is getAccumulatedWaitingTime(car_id)
wait_time = traci.vehicle.getAccumulatedWaitingTime(car_id)
I found that it returns the accumulated waiting time [s] within the previous time interval of default length 100 s. (length is configurable per option --waiting-time-memory given to the main application).
you used this #def to get reward. I have a question, when I choose_action and simulate, the steps increase 10steps or 14steps. and, reward comes from the interval between the actions. Then when steps increase, can we get reward not only within time interval of the two actions but also previous actions? (because def returns accumulated waiting time within previous interval 100steps).
so, can we get more waiting time? like within(10(1st action)+10(2nd action)+14(3rd action) = 34steps). what if, the current_steps over the 100step like (102steps). then will the 2steps (of first action) be ignored?