Why normalization the state with IDQN by dividing 28?

Pi-Star-Lab / RESCO

Reinforcement Learning Benchmarks for Traffic Signal Control (RESCO)

116 stars 36 forks source link

Why normalization the state with IDQN by dividing 28? #4

Closed lisong2019 closed 2 years ago

lisong2019 commented 2 years ago

In the state.py file, why the state value is divided by 28 for nomalization? for example, in norm_DQN: lane_obs.append(signal.full_observation[lane]['total_wait'] / 28).

lisong2019 commented 2 years ago

and also for the normalization of the reward waiting time function. Why use 224? rewards[signal_id] = np.clip(-total_wait/224, -4, 4).astype(np.float32)

jault commented 2 years ago

It's the max detection distance divided by (passenger car length + min gap length) for the state normalization.. which actually should be ~27 not 28. The purpose is just to keep the number small though.

The reward norm takes 28 and assumes 8 seconds waiting time for each vehicle. The idea is the same as for the state norm. Tuning these might give better results.