Closed zhangxinchen123 closed 4 years ago
You are absolutely right, it is a normalization to the [0, 1] interval, through a linear mapping. Maybe the word remap is a poor choice here.
You are absolutely right, it is a normalization to the [0, 1] interval, through a linear mapping. Maybe the word remap is a poor choice here.
Thanks,but i don‘t know why normalization?I just guess,maybe in order to let the reward function convergence?I notice that when i use the baseline.json and highway_env to train the agent, Each episdoe score is positive,maybe the reward function has some negative numbers which will better to train the agent,this is not necessarily right,i just guess,thank you !
In theory, if you scale or add on offset to the reward function you do not change the optimal policy. In practice, deep reinforcement learning algorithms train better if the rewards are normalized into [-1, 1] or [0, 1] rather than, say, [-30, 100].
In theory, if you scale or add on offset to the reward function you do not change the optimal policy. In practice, deep reinforcement learning algorithms train better if the rewards are normalized into [-1, 1] or [0, 1] rather than, say, [-30, 100].
Thanks for you help !I wish you success in your studies!
Hi, i look the file named utils.py which the path is highway_env/envs/common/utils.py, in the file, i noticed that define a function named "remap", and in the highway_env.py, the "remap" function is used to caculate the reward , but i can't understand the reward function which in highway_env.py 81 lines, i don't understand why the reward function is this form, maybe it‘s normalization ?I' don't think my idea is right, can you tell me ? Thanks for you reply!