PacktPublishing / Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
https://www.packtpub.com/big-data-and-business-intelligence/hands-intelligent-agents-openai-gym
MIT License
365 stars 148 forks source link

Weird RGB camera position #20

Closed ryanwang522 closed 5 years ago

ryanwang522 commented 5 years ago

Hi @praveen-palanisamy , thanks for the great work!

  1. When i imshow the observation in carla_env.py .
    def _read_observation(self):
    ...
    cv2.imshow("obs", to_rgb_array(observation))
    cv2.waitKey()

It produced weird image like below. obs_screenshot_06 03 2019

Then I noticed that https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/b5395ba23982a90145c34677992972f60f957091/ch8/environment/carla_gym/envs/carla_env.py#L264-L268 is different from the official client_example.py in 0.8.2 in line 267.

So i modified it to camera2.set_position(0.30, 0, 1.30) The result of imshow becomes as expects.

obs

The reason of the weird observation may due to the incorrect camere position ?

  1. Could you please explains whats the Straight_Poses_Town2 (or others) in scenario.json for ?

  2. At the end of training the a2c_agent, are we expect the agent will drive safely (no collision/cross lane) around the town? Then what's the Lane_Keep_Town2 scenario for ?

praveen-palanisamy commented 5 years ago

Hey @ryanwang522 ,

  1. Thanks for reporting this! The units were changed from centimeters to meters in Carla which is why the location, as you report doesn't seem to make sense. I have updated the code to change the units.

  2. The scenarios.json file provides an easy way to use/create new scenarios. Lane_keep_Town1 & Lane_keep_Town2 are examples showing how to define scenarios for the Carla driving environment. The "Straight_Poses_Town2" and other pose definitions make it easy to compose new scenarios (like Lane_keep_Town2). For example, if you want to create some curvy road driving gym Environment, you can create a scenario definition by selecting the start_pos_id and end_pos_id from Curve_Poses_Town2 .

Please refer to the Create new CARLA Scenarios/ Gym Environments wiki page for an example

  1. Yes! Hope that answers your questions. Feel free to follow-up if you need more information or close if it's all good.
ryanwang522 commented 5 years ago

Hi @praveen-palanisamy , thanks for the quick reply!

Yes!

So I'm just wondering if the pre-trained model can do the job ? (I'll try it tomorrow!) Btw, is the pre-trained model was trained on continuous action space?

And there are some questions when I tried to utilize the environment for RL:

  1. What's the purpose to clip the reward to -1 0 1 ?

    Okay. I've surveyed about the purpose. The reason is to improve the learning efficiency with the reward normalization right ?

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/a2c_agent.py#L195-L196

  1. When training with a2c_agent with rendering. The rendered scene just looks like the correct front-image above at the first episode. However in all of the following episodes, the rendered scene is changed into a different scene (sorry I don't have any related screenshot now). Is the phenomenon as expected ?

  2. I've tried to train my own model using carla_env, but it seems sometimes even when there are some collision with the car, the episode didn't end. I thought it will be end according to the code below.

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/environment/carla_gym/envs/carla_env.py#L362-L365

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/environment/carla_gym/envs/carla_env.py#L537-L542

  1. What's py_measurement["next_command"] for ?
fangchuan commented 5 years ago

hi, buddy, I don't think our questions are exactly same, but for your problems:

  1. the clipped reward is not necessary, especially in the autonomous application settings, we usually use a shaped reward function which covers the distance, the speed, the collision information and so forth.
  2. I forgot the last time I run the a2c_agent.py, because I use my own agent based on tensorflow, but I suppose I had not occurred the problem you said, if you have made some change in the original code released by the author, you do need check your code.
  3. yes, sometimes the agent does not end the episode as soon as the collision happens, so there are some other constrain like the max_steps , the next_command(received from the higher planner) and the lowest_total_rewards. I guess in this way, it can help the agent learn more about information the collision spot instead of end the episode immediately.
praveen-palanisamy commented 5 years ago

@ryanwang522 The trained A2C/A3C agent model used the continuous action space.

  1. Clipping the reward to lie in [-1, 1] provides a way to normalize the rewards. This is helpful/necessary for some RL algorithms, especially those that uses the policy gradient in order to not make too-big/too-small policy update steps. While the scale of the rewards and their distribution affects the learning performance, whether reward clipping is necessary or not depends on the problem domain ( learning environment). There is not a good amount of research done in this regard in the RL field, but the following figure: image from the "Learning values across many orders of magnitude", Hasselt et.al paper discusses some of the effects and proposes: Preserving Outputs Precisely while Adaptively Rescaling Targets (POP-ART) as a way to mitigate some of the shortcomings of reward clipping.

  2. See if this is related to #21. As @fangchuan points out, there were no issues with the Agent code/training that was observed.

  3. The environment will by default terminate the current episode and start a new one if the Agent is involved in a collision. But may be in some cases (like @fangchuan 's response above?) , you may want to not terminate on collisions. You can choose the behavior based on this config: https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/environment/carla_gym/envs/carla_env.py#L79

  4. The py_measurement["next_command"] is the higher-level path planner's guidance that could take one of the following values: https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch8/environment/carla_gym/envs/carla_env.py#L45-L52

You could use it to train your agent if you want.

Hope that answers your questions! If you have other follow up questions, please consider opening a new issue specific to the new set of questions and close the ones that have been answered. This will help to keep things organized.