Closed ryanwang522 closed 5 years ago
Hey @ryanwang522 ,
Thanks for reporting this! The units were changed from centimeters to meters in Carla which is why the location, as you report doesn't seem to make sense. I have updated the code to change the units.
The scenarios.json file provides an easy way to use/create new scenarios. Lane_keep_Town1
& Lane_keep_Town2
are examples showing how to define scenarios for the Carla driving environment. The "Straight_Poses_Town2" and other pose definitions make it easy to compose new scenarios (like Lane_keep_Town2
).
For example, if you want to create some curvy road driving gym Environment, you can create a scenario definition by selecting the start_pos_id
and end_pos_id
from Curve_Poses_Town2
.
Please refer to the Create new CARLA Scenarios/ Gym Environments wiki page for an example
Hi @praveen-palanisamy , thanks for the quick reply!
Yes!
So I'm just wondering if the pre-trained model can do the job ? (I'll try it tomorrow!) Btw, is the pre-trained model was trained on continuous action space?
And there are some questions when I tried to utilize the environment for RL:
Okay. I've surveyed about the purpose. The reason is to improve the learning efficiency with the reward normalization right ?
When training with a2c_agent with rendering. The rendered scene just looks like the correct front-image above at the first episode. However in all of the following episodes, the rendered scene is changed into a different scene (sorry I don't have any related screenshot now). Is the phenomenon as expected ?
I've tried to train my own model using carla_env, but it seems sometimes even when there are some collision with the car, the episode didn't end. I thought it will be end according to the code below.
py_measurement["next_command"]
for ?hi, buddy, I don't think our questions are exactly same, but for your problems:
@ryanwang522 The trained A2C/A3C agent model used the continuous action space.
Clipping the reward to lie in [-1, 1]
provides a way to normalize the rewards. This is helpful/necessary for some RL algorithms, especially those that uses the policy gradient in order to not make too-big/too-small policy update steps. While the scale of the rewards and their distribution affects the learning performance, whether reward clipping is necessary or not depends on the problem domain ( learning environment). There is not a good amount of research done in this regard in the RL field, but the following figure:
from the "Learning values across many orders of magnitude", Hasselt et.al paper discusses some of the effects and proposes: Preserving Outputs Precisely while Adaptively Rescaling Targets (POP-ART) as a way to mitigate some of the shortcomings of reward clipping.
See if this is related to #21. As @fangchuan points out, there were no issues with the Agent code/training that was observed.
The environment will by default terminate the current episode and start a new one if the Agent is involved in a collision. But may be in some cases (like @fangchuan 's response above?) , you may want to not terminate on collisions. You can choose the behavior based on this config: https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/environment/carla_gym/envs/carla_env.py#L79
The py_measurement["next_command"]
is the higher-level path planner's guidance that could take one of the following values:
https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/environment/carla_gym/envs/carla_env.py#L45-L52
You could use it to train your agent if you want.
Hope that answers your questions! If you have other follow up questions, please consider opening a new issue specific to the new set of questions and close the ones that have been answered. This will help to keep things organized.
Hi @praveen-palanisamy , thanks for the great work!
imshow
the observation incarla_env.py
.It produced weird image like below.
Then I noticed that https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/b5395ba23982a90145c34677992972f60f957091/ch8/environment/carla_gym/envs/carla_env.py#L264-L268 is different from the official
client_example.py
in 0.8.2 in line 267.So i modified it to
camera2.set_position(0.30, 0, 1.30)
The result of imshow becomes as expects.The reason of the weird observation may due to the incorrect camere position ?
Could you please explains whats the
Straight_Poses_Town2
(or others) inscenario.json
for ?At the end of training the a2c_agent, are we expect the agent will drive safely (no collision/cross lane) around the town? Then what's the
Lane_Keep_Town2
scenario for ?