Hello, the problem is about the "remap" function

Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving

https://highway-env.farama.org/

MIT License

2.52k stars 727 forks source link

Hello, the problem is about the "remap" function #41

Closed zhangxinchen123 closed 4 years ago

zhangxinchen123 commented 4 years ago

Hi, i look the file named utils.py which the path is highway_env/envs/common/utils.py, in the file, i noticed that define a function named "remap", and in the highway_env.py, the "remap" function is used to caculate the reward , but i can't understand the reward function which in highway_env.py 81 lines, i don't understand why the reward function is this form, maybe it‘s normalization ？I' don't think my idea is right, can you tell me ? Thanks for you reply!

eleurent commented 4 years ago

You are absolutely right, it is a normalization to the [0, 1] interval, through a linear mapping. Maybe the word remap is a poor choice here.

zhangxinchen123 commented 4 years ago

You are absolutely right, it is a normalization to the [0, 1] interval, through a linear mapping. Maybe the word remap is a poor choice here.

Thanks，but i don‘t know why normalization？I just guess，maybe in order to let the reward function convergence？I notice that when i use the baseline.json and highway_env to train the agent, Each episdoe score is positive，maybe the reward function has some negative numbers which will better to train the agent，this is not necessarily right，i just guess，thank you ！

eleurent commented 4 years ago

In theory, if you scale or add on offset to the reward function you do not change the optimal policy. In practice, deep reinforcement learning algorithms train better if the rewards are normalized into [-1, 1] or [0, 1] rather than, say, [-30, 100].

zhangxinchen123 commented 4 years ago

In theory, if you scale or add on offset to the reward function you do not change the optimal policy. In practice, deep reinforcement learning algorithms train better if the rewards are normalized into [-1, 1] or [0, 1] rather than, say, [-30, 100].

Thanks for you help ！I wish you success in your studies！