PacktPublishing / Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
https://www.packtpub.com/big-data-and-business-intelligence/hands-intelligent-agents-openai-gym
MIT License
365 stars 148 forks source link

questions about how to assign the reward for in the carla environment #19

Closed fangchuan closed 5 years ago

fangchuan commented 5 years ago

Hi, recently i have been concentrated on training my agent in carla, it seemed my agent based on dqn did not bad. But i still cannot understand why you calculate the reward in this way:

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/master/ch8/environment/carla_gym/envs/carla_env.py `

def calculate_reward(self, current_measurement):
    """
    Calculate the reward based on the effect of the action taken using the previous and the current measurements
    :param current_measurement: The measurement obtained from the Carla engine after executing the current action
    :return: The scalar reward
    """
    reward = 0.0

    cur_dist = current_measurement["distance_to_goal"]

    prev_dist = self.prev_measurement["distance_to_goal"]

    if self.config["verbose"]:
        print("Cur dist {}, prev dist {}".format(cur_dist, prev_dist))

    # Distance travelled toward the goal in m
    reward += np.clip(prev_dist - cur_dist, -10.0, 10.0)

    # Change in speed (km/hr)
    reward += 0.05 * (current_measurement["forward_speed"] - self.prev_measurement["forward_speed"])

    # New collision damage
    reward -= .00002 * (
        current_measurement["collision_vehicles"] + current_measurement["collision_pedestrians"] +
        current_measurement["collision_other"] - self.prev_measurement["collision_vehicles"] -
        self.prev_measurement["collision_pedestrians"] - self.prev_measurement["collision_other"])

    # New sidewalk intersection
    reward -= 2 * (
        current_measurement["intersection_offroad"] - self.prev_measurement["intersection_offroad"])

    # New opposite lane intersection
    reward -= 2 * (
        current_measurement["intersection_otherlane"] - self.prev_measurement["intersection_otherlane"])

    return reward

` Is it a really well-considered solution to calculate reward? like does it think of the situation of traffic light,limit of speed and things like that. and what is each coefficient meaning? I want to formulate a comprehensive way to calculate reward, but i donot have any good idea, i'm looking forward to your reply. @praveen-palanisamy

praveen-palanisamy commented 5 years ago

Hi @fangchuan ,

That reward function tries to motivate the agent to drive towards the goal with as much high speed as possible without colliding, driving on wrong lanes or on sidewalks. The coefficients were tuned to have a good, scalar reward value that is good enough to let the agent learn to drive well.

The reward calculation doesn't take into account the traffic lights because the traffic light support doesnt exist in an usable way (in the stable version of Carla). The speed limits weren't imposed but the maximum speed of the vehicle is limited by the physics (taken care by the simulator) and so it didn't have a huge negative impact.

The reward values are calculated in a very similar way as the Carla paper for bechmarking and comparison purposes. If you are interested in designing a comprehensive reward function calculation, you may want to start off with a basic version (like the one used in this code-base) and then add more terms one by one and tune the coefficients until you achieve a reasonable reward/penalty.

Some additional factors to consider:

fangchuan commented 5 years ago

thanks you! I'm trying to tune these coefficients of The reward equation