Closed fangchuan closed 5 years ago
As the agent in the CARLA preset in Coach does not have a destination goal (unlike in the CARLA paper), we built it so it will learn to drive as much as possible. The reward was defined such that the agent will be encouraged to drive as fast as it can without colliding or intersecting other lanes or going off road. To stabilize the drive, we also discourage unnecessary steering, hence the negative impact of steering on the reward.
@galnov ,Im grateful for your reply. yeah, I have tried the preset CARLA_DDPG, and it was trained about 1M steps until the agent converged. Then I want to make it possible for task with a fixed destination. I have revised the carla_environment.py, in several points: the task is an curve trajectory between start_position and end_position in Town01. observation_space = Tuple([image_space, measurement_space]) measurement_space = [higher_command, forward_speed, distance_to_goal, is_collision] distance_reward = delta_distance = previous_distance_to_goal - current_distance_to_goal; distance_reward = np.clip(distance_reward, -10,10) reward = distance_reward + speed_reward - ...
And, I also modify the InputEmbedderLayer used in CARLA_DDPG, like the actor network architecture in the png, critic network has same InputEmbedderLayer. Now, my problem is the agent does not seem to converge, and it have learned something unexpected. The agent has learned for about 800000 steps, however, it haven't learned how to turn right, could you help me figure out what's wrong with my solution? Please, I'm almost crazy...
Is it caused by the choice of measurements data? I mean, only the higher_command, forward_speed, distance_to_goal
seems not enough for an approximator(DNN) to output a reference trajectory. The higher_command
isn't compatible of MDP, what should I organize the measurements data? Add the current location of agent in the measurements data?
I suggest you take a look at the Conditional Imitation Learning agent for an example on how to train an agent with high level commands. It implements this paper.
well, I really appreciate your suggestion. @galnov
Hi, recently i am concerned on my graduation project in CARLA, I have noticed that the reward function of CARLA in coach was totally different from the formula introduced by "CARLA: An Open Urban Driving Simulator". While in the implementation of carla_environment.py, I saw the reward was calculated in this way:
` self.reward = speed_reward
Honestly, I have trained my agent based on the reward formula of CARLA's paper, it seemed he needs many episodes to run util produce a good performance, sometimes, it even couldn't converge, although I used the similar network in DDPG algorithm. Could you explain why you chose this reward formula? I really appreciate that. @galnov @galleibo-intel @shadiendrawis @itaicaspi