Hello! - Githubissues

zhangxinchen123 commented 4 years ago

Hi, I have some question that if i want to use DQN and DDQN to train the agent in the high_env, which json file should i choose ? I notice that there is a file named "no_dueling.json" in the /scripts/configs/HighwayEnv/agents/DQNAgent, but in the model.py there is not Unknown model type to match with the "no_dueling.json", what should i do ? thanks for you help!

eleurent commented 4 years ago

The DuelingNetwork architecture is defined in https://github.com/eleurent/rl-agents/blob/master/rl_agents/agents/common/models.py It is selected (or not) based on the model field of the configuration dictionary here (DQN agent): https://github.com/eleurent/rl-agents/blob/master/rl_agents/agents/deep_q_network/pytorch.py#L17 and there (model factory): https://github.com/eleurent/rl-agents/blob/master/rl_agents/agents/common/models.py#L415

By default it is enabled in the baseline config https://github.com/eleurent/rl-agents/blob/master/scripts/configs/HighwayEnv/agents/DQNAgent/baseline.json#L4, and disabled in the do-dueling config https://github.com/eleurent/rl-agents/blob/master/scripts/configs/HighwayEnv/agents/DQNAgent/no_dueling.json#L4 Be careful: the other fields are also different (gamma, batch size, etc.), if you wish to only compare the architecture you should choose them equal.

zhangxinchen123 commented 4 years ago

The DuelingNetwork architecture is defined in https://github.com/eleurent/rl-agents/blob/master/rl_agents/agents/common/models.py It is selected (or not) based on the model field of the configuration dictionary here (DQN agent): https://github.com/eleurent/rl-agents/blob/master/rl_agents/agents/deep_q_network/pytorch.py#L17 and there (model factory): https://github.com/eleurent/rl-agents/blob/master/rl_agents/agents/common/models.py#L415

By default it is enabled in the baseline config https://github.com/eleurent/rl-agents/blob/master/scripts/configs/HighwayEnv/agents/DQNAgent/baseline.json#L4, and disabled in the do-dueling config https://github.com/eleurent/rl-agents/blob/master/scripts/configs/HighwayEnv/agents/DQNAgent/no_dueling.json#L4 Be careful: the other fields are also different (gamma, batch size, etc.), if you wish to only compare the architecture you should choose them equal.

Another question is that in the highway_env/envs/common/observation.py, the feature of the class KinematicObservation is ['presence', 'x', 'y', 'vx', 'vy'], i understand that the presence mean whether the vehicle exist, exist is "1", otherwise "0". the "x" and "y" means the relative distance between Longitudinal and Horizontal, the "vx" and "vy" means the relative velocity between Longitudinal and Horizontal, is this right ? thanks a lot !

eleurent commented 4 years ago

That's right, don't forget that x,y,vx,vy are also normalized using the config feature_range. You can also set the configuration "absolute":true, in which case the fields x,y,vx,vy are absolute positions and velocity instead of relative.

zhangxinchen123 commented 4 years ago

That's right, don't forget that x,y,vx,vy are also normalized using the config feature_range. You can also set the configuration "absolute":true, in which case the fields x,y,vx,vy are absolute positions and velocity instead of relative.

Thanks, but i see in the no_dueling file, the type is the "FCNetwork", and in the model.py there is not "type" model match the "FCNetwork", how can i do ?

eleurent commented 4 years ago

Ah! I see, that's because I renamed the model to MultiLayerPerceptron without changing the config, sorry about that.

zhangxinchen123 commented 4 years ago

Ah! I see, that's because I renamed the model to MultiLayerPerceptron without changing the config, sorry about that.

I'm sorry that i'm a beginner that i have question a lot. My question is what the meaning of the file named no_dueling
"exploration": { "method": "EpsilonGreedy", "tau": 300000, "temperature": 1.0, "final_temperature": 0.1 } what the meaning of the "tau", "temperature" and "final_temperature"? And is this file match the "DQN"? If i want to use DDQN to train, i need to modify the model.py,is this right? Thanks for your help !

eleurent commented 4 years ago

My question is what the meaning of the file named no_dueling

In this file, a Multi-Layer Perceptron model is chosen instead of the Dueling architecture from DDQN (by default in baseline.json).

what the meaning of the "tau", "temperature" and "final_temperature"?

They are used to schedule the epsilon-greedy exploration: epsilonstarts at "temperature" and converges to "final_temperature" with an exponential decay of time constant "tau" (unit: number of steps).

If i want to use DDQN to train, i need to modify the model.py,is this right?

No, if you want to use DDQN you just have to run an experiment with the baseline.json config, which already selects the Dueling architecture in its "model" field.

zhangxinchen123 commented 4 years ago

My question is what the meaning of the file named no_dueling

In this file, a Multi-Layer Perceptron model is chosen instead of the Dueling architecture from DDQN (by default in baseline.json).

what the meaning of the "tau", "temperature" and "final_temperature"?

They are used to schedule the epsilon-greedy exploration: epsilonstarts at "temperature" and converges to "final_temperature" with an exponential decay of time constant "tau" (unit: number of steps).

If i want to use DDQN to train, i need to modify the model.py,is this right?

No, if you want to use DDQN you just have to run an experiment with the baseline.json config, which already selects the Dueling architecture in its "model" field.

Thanks a lot！So if I use the Dueling DQN to train，can i use the baseline.json? Can i both run the baseline.json to use the DDQN and Dueling DQN to train the agent？ Another question is that i caculate the success rate in 10000 episode，the vehicle count is 20，the duration is 20, the environment is env-medium,the episode which is success is about 4000，i put the gamma equal to 0.95，and other parameter keeps unchangeable in baseline.json，i define the success is the ego vehicle which is not crash in a episode，is that right？And the average reward during the train is rising consistently. I guess if i train the episode exceed the 10000 times，the success will better，is this right？Thanks for you help！！！

eleurent commented 4 years ago

Yes, baseline.json has both Dueling and Double DQN (Double DQN is always activated and not configurable). I'm not sure I understood your second question.

zhangxinchen123 commented 4 years ago

Yes, baseline.json has both Dueling and Double DQN (Double DQN is always activated and not configurable). I'm not sure I understood your second question.

Thanks，so my question is if i want to use the DDQN to train, what command should i run? I notice that in the baseline.json, the "type" is "DuelingNetwork",and what the different of the command if i want to use the DQN and Dueling DQN to train? I'm very sorry, I want to explain the second question above: I use the Dueling DQN to train the agent in the enviromment which named "env_medium", I changed the "vehicles_count" and "duration" both equal to 20，and i want to calculate the ”success rate“ in the training，I trained a total of 10,000 episode，and i define the success episode is that ”the ego vehicle did not collide during a episode“ and in the 10000 episode the success episode is about 4000, is this result correct? Will this success rate increase if I train more episode? Thanks for you help！

zhangxinchen123 commented 4 years ago

Yes, baseline.json has both Dueling and Double DQN (Double DQN is always activated and not configurable). I'm not sure I understood your second question.

Hi, there are some detail in the experiment which is described above: The success rate is keep about 40 percent，and the terminal tell me "[root:WARNING] NaN or Inf found in input tensor.",and the success rate and average reward begin decrease, How can i solve the problem? Are there any method to increase the success rate? Thanks ! !

eleurent commented 4 years ago

i define the success episode is that ”the ego vehicle did not collide during a episode“ and in the 10000 episode the success episode is about 4000, is this result correct?

I don't know, you tell me ;) I haven't run this experiment specifically.

Will this success rate increase if I train more episode?

Maybe, maybe not. Make sure to check the schedule of exploration (random actions vs estimated optimal action): if you agent stops exploring too early (i think the default configuration is for about 1k episodes of exploration), it will be stuck in its current suboptimal behaviour.

terminal tell me "[root:WARNING] NaN or Inf found in input tensor."

That is very strange, I haven't run into this issue. It would be great if you could reproduce and investigate this bug. Apparently, the observation generated by the environment contains NaN or Inf. What type of observation are you using?

zhangxinchen123 commented 4 years ago

i define the success episode is that ”the ego vehicle did not collide during a episode“ and in the 10000 episode the success episode is about 4000, is this result correct?

I don't know, you tell me ;) I haven't run this experiment specifically.

Will this success rate increase if I train more episode?

Maybe, maybe not. Make sure to check the schedule of exploration (random actions vs estimated optimal action): if you agent stops exploring too early (i think the default configuration is for about 1k episodes of exploration), it will be stuck in its current suboptimal behaviour.

terminal tell me "[root:WARNING] NaN or Inf found in input tensor."

That is very strange, I haven't run into this issue. It would be great if you could reproduce and investigate this bug. Apparently, the observation generated by the environment contains NaN or Inf. What type of observation are you using?

Hi，i use the “KinematicObservation” observation. The terminal tell me "[root:WARNING] NaN or Inf found in input tensor." i think maybe the reason is that I change the reward function, i have a question about the reward function, if the variance of each reward function is too large，it will be occur the "[root:WARNING] NaN or Inf found in input tensor." i put the reward function such as if the ego vehicle crash, the reward will given -10, and if ego vehicle lane change, i will given a reward equal to -0.1, and the velocity reward is 3, and i put distance reward is -(100 - distance)*0.03 ,which the distance represent the ego vehicle to the front vehicle which in the same lane. And I change the normalized form that the reward of every steps is reward = ((reward - average(reward))/std(reward) and i noticed that every episode of the reward have a big change , maybe -20 to 30 or 40 ,I think this is bad for model convergence, maybe the distance reward function is the major reason of the fluctuation which influences the total reward. I think the method that put the reward to normalize in to [0~1] is bettter. I notice that if i change the duration time, and if i train the agent, the terminal also tell me the " [root:WARNING] NaN or Inf found in input tensor" sometimes. I will continue to study this issue in the future ! And another question is that if i want to compared with the different model which named "DDQN" and "Dueling DQN", what command should i run in turn ? I'm very sorry to bother you again !

eleurent commented 4 years ago

This NaN of Inf in the input tensor might come from your reward = ((reward - average(reward))/std(reward) formula when the std is either zero or Nan?

I cannot reproduce this issue myself, so you should use a debugger and add a breakpoint (or a print statement) whenever this exception is raised in order to check the content of the observation and reward tensors.

zhangxinchen123 commented 4 years ago

This NaN of Inf in the input tensor might come from your reward = ((reward - average(reward))/std(reward) formula when the std is either zero or Nan?

I cannot reproduce this issue myself, so you should use a debugger and add a breakpoint (or a print statement) whenever this exception is raised in order to check the content of the observation and reward tensors.

Hi, I'm very sorry to bother you again, the problem that i find maybe the distance reward function cause the total reward Unstable. Another question is that if i want to compared with the different model which named "DDQN" and "Dueling DQN", what command should i run if i want to train the agent which use the "DDQN" and "Dueling DQN" ? Thanks !

eleurent commented 4 years ago

For Dueling Double DQN: python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/baseline.json --train --episodes=1000
For Double DQN: python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/no_dueling.json --train --episodes=1000

But beware that this is not a fair comparison since other parameters are changed in the no_dueling.json config. In order to compare the two algorithms, you should copy every parameter from baseline.json to no_dueling.json except for the "model" parameter.

zhangxinchen123 commented 4 years ago

* For Dueling Double DQN:
  `python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/baseline.json --train --episodes=1000`

* For Double DQN:
  `python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/no_dueling.json --train --episodes=1000`
But beware that this is not a fair comparison since other parameters are changed in the no_dueling.json config. In order to compare the two algorithms, you should copy every parameter from baseline.json to no_dueling.json except for the "model" parameter.

Thanks ! If i only use DQN to train, can i use the no_dueling.json to train? Another is that if i want to test the model which i train before, which command should i need to run in terminal ? thanks!!

eleurent commented 4 years ago

@zhangxinchen123 sorry for the delay, I lost track of this issue. the no_dueling config does not exist anymore, if you want only dqn you should now use the dqn.jsonconfig. If you want to test the model you just trained, you should run the same command and replace the --train option by --test. You should see a message saying that the latest saved model has been loaded.

zhangxinchen123 commented 4 years ago

@zhangxinchen123 sorry for the delay, I lost track of this issue. the no_dueling config does not exist anymore, if you want only dqn you should now use the dqn.jsonconfig. If you want to test the model you just trained, you should run the same command and replace the --train option by --test. You should see a message saying that the latest saved model has been loaded.

Thanks for you reply! ! And I have a question is about that the lateral and longitudinal trajectory planning of the ego vehicle when it lane change, does this follow any rules? Thank you! ! !

eleurent commented 4 years ago

It the ego vehicle's lateral position and longitudinal velocity are controlled by state feedback controllers implemented in steering_control() and velocity_control(). The resulting trajectory is obtained by integrating the kinematics model with these control laws. A lane change trajectory simply corresponds to a change of setpoint for the lateral position.

zhangxinchen123 commented 4 years ago

It the ego vehicle's lateral position and longitudinal velocity are controlled by state feedback controllers implemented in steering_control() and velocity_control(). The resulting trajectory is obtained by integrating the kinematics model with these control laws. A lane change trajectory simply corresponds to a change of setpoint for the lateral position.

Thanks for you reply ! !

EnormousAdversity commented 4 years ago

I have a question to ask you, the observation value observed under this project of highway is a 5 5 matrix, I understand this. But in merge and roundabout are 3 10 * 3 matrices, etc., I have a little doubt, I do n’t know if you can answer it. Thank you in advance for your answers

eleurent commented 4 years ago

@EnormousAdversity different types of observations are defined here. The environments use different types as a default configuration, defined in their default_config(self) -> dict methods.

You can override them depending on your needs, by using:

env.configure({
    "observation": {
        "type": <your desired observation type>,
        <arg>: <value>  # observation parameters
    }
})

Farama-Foundation / HighwayEnv

Hello! #42