AndreaVidali / Deep-QLearning-Agent-for-Traffic-Signal-Control

A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.
MIT License
405 stars 146 forks source link

Reward #37

Open dongspam0209 opened 9 months ago

dongspam0209 commented 9 months ago

def _collect_waiting_times in training_simulation.py there is getAccumulatedWaitingTime(car_id)

wait_time = traci.vehicle.getAccumulatedWaitingTime(car_id)

I found that it returns the accumulated waiting time [s] within the previous time interval of default length 100 s. (length is configurable per option --waiting-time-memory given to the main application).

you used this #def to get reward. I have a question, when I choose_action and simulate, the steps increase 10steps or 14steps. and, reward comes from the interval between the actions. Then when steps increase, can we get reward not only within time interval of the two actions but also previous actions? (because def returns accumulated waiting time within previous interval 100steps).

so, can we get more waiting time? like within(10(1st action)+10(2nd action)+14(3rd action) = 34steps). what if, the current_steps over the 100step like (102steps). then will the 2steps (of first action) be ignored?