eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
578 stars 152 forks source link

hello,I am a beginner in the reinforcement learning!! #29

Closed zhangxinchen123 closed 4 years ago

zhangxinchen123 commented 4 years ago
    Hello, first that I'm sorry to bother you again,when I run the command :

“ python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=10”, the pygame environment named “highway-env” can only run 2 episode,and the 3 to 7 episode is vert quick ,when the eighty episode bagin,the pygame environment namd “ highway-env” rerun,why does this happend?Another question is that if i want to modify the reward function:i want to use a distance between the Self vehicle and the nearest vehicle which is on same lane to the Self vehicle,how can i modify the reward ?thanks for you help !I‘m sorry i ask these question which you may feel very low,I am a beginner in the reinforcement learning,thanks a lot!!

eleurent commented 4 years ago

Hi @zhangxinchen123, sure no problem, there are no stupid questions.

This is because rendering the environment scene a significant time compared to just running a simulation. In order to have a faster training, we disable rendering in the majority of episodes, and only render once in a while to see the progress of the policy. The rule to know which episode to render is chosen here, and the default option (None) is capped_cubic_video_schedule: we render episode numbers with a cubic progression: 0,1, 8, 27, 64... capped at 1000, 2000, 3000... When using the option --no-display, no episode are rendered. When testing the trained policy with --test, every episode is rendered.

The reward function of the highway-v0 environment is defined here

If you want to modify it, you need to clone the highway-env repository locally and uninstall the current version with pip uninstall highway-env. Then, you can change the code of the reward as you like, and simply add your local highway-env directory to the python path. You can get the nearest vehicle, and compute the distance to the self vehicle with:

front_vehicle, _ = self.road.neighbour_vehicles(self.vehicle)
distance = self.vehicle.lane_distance_to(front_vehicle)
zhangxinchen123 commented 4 years ago

Hi @zhangxinchen123, sure no problem, there are no stupid questions.

* First question: why do we only see episodes 0, 1 and 8, while episodes 2 to 7 run very quickly ?

This is because rendering the environment scene a significant time compared to just running a simulation. In order to have a faster training, we disable rendering in the majority of episodes, and only render once in a while to see the progress of the policy. The rule to know which episode to render is chosen here, and the default option (None) is capped_cubic_video_schedule: we render episode numbers with a cubic progression: 0,1, 8, 27, 64... capped at 1000, 2000, 3000... When using the option --no-display, no episode are rendered. When testing the trained policy with --test, every episode is rendered.

* second question: change the reward function to use the distance to nearest vehicle

The reward function of the highway-v0 environment is defined here

If you want to modify it, you need to clone the highway-env repository locally and uninstall the current version with pip uninstall highway-env. Then, you can change the code of the reward as you like, and simply add your local highway-env directory to the python path. You can get the nearest vehicle, and compute the distance to the self vehicle with:

front_vehicle, _ = self.road.neighbour_vehicles(self.vehicle)
distance = self.vehicle.lane_distance_to(front_vehicle)

Thanks for you reply ! So can I modify the file which the path is "/highway/envs/highway_env.py", in the file, i see the class HighwayEnv is include the reward function,such as:“COLLISION_REWARD,HIGH_VELOCITY_REWARD”and so on,can i increase the reward function which use the distance to nearest vehicle here? Another question is that I look the flies named abstract.py and observation.py which the path is " /highway/envs/common/", i find the action and observation space is defind in these file, but when i look the file named “highway_env.py”,it's not import the observation.py file,so what is connection between the “highway_env.py”,"abstract.py" and "observation.py"? Thanks for you help !

eleurent commented 4 years ago

can i increase the reward function which use the distance to nearest vehicle here?

There is no configurable reward depending on the distance to nearest vehicle right now, but you can add one yourself.

what is connection between the “highway_env.py”,"abstract.py" and "observation.py"? Thanks for you help !

From the HighwayEnv class, you can choose and configure the observation through the "observation" field in the configuration. You can also do it at the env creation:

env = gym.make("highway-v0")
env.configure({"observation": {"type": "Kinematics", <other observation parameters ...>}})
env.reset()
zhangxinchen123 commented 4 years ago

can i increase the reward function which use the distance to nearest vehicle here?

There is no configurable reward depending on the distance to nearest vehicle right now, but you can add one yourself.

what is connection between the “highway_env.py”,"abstract.py" and "observation.py"? Thanks for you help !

* `highway-env.py` describes the highway-v0 environment. It inherits from...

* `abstract.py`, which defines AbstractEnv, a bluebrint class that implements common functions of every environments (roundabout, parking, etc.). For instance, this class generates the observation when env.step() is called. To that end, it imports...

* `observation.py`, where some types of observations are defined.

From the HighwayEnv class, you can choose and configure the observation through the "observation" field in the configuration. You can also do it at the env creation:

env = gym.make("highway-v0")
env.configure({"observation": {"type": "Kinematics", <other observation parameters ...>}})
env.reset()

OK,thanks for help!i have a question is that i look the when i run the command use the baseline.json to train the agent in env_medium, i set the episode equal to 10000, but when run the episode equal to 1000, the score is about 30,but it's unstable,does the dueling dqn algorithm which in the baseline convergence?thanks for you help!

eleurent commented 4 years ago

First, I think that 10000 episodes is a lot since the exploration (epsilon-greedy) is scheduled to converge to 0% random actions after about 1K episodes. I do not remember which average performance I obtained in my own experiments, but I think that I still had about 10% of episodes that resulted in crashes. It think it can probably be decreased by playing with the observation and network architecture (for instance, use an ego-attention architecture with relative coordinates rather than the baseline), but I haven't tried it yet.

zhangxinchen123 commented 4 years ago

First, I think that 10000 episodes is a lot since the exploration (epsilon-greedy) is scheduled to converge to 0% random actions after about 1K episodes. I do not remember which average performance I obtained in my own experiments, but I think that I still had about 10% of episodes that resulted in crashes. It think it can probably be decreased by playing with the observation and network architecture (for instance, use an ego-attention architecture with relative coordinates rather than the baseline), but I haven't tried it yet.

ok, thanks for you reply ! I want to try but I don't know what problems will face,thank you!