araffin / learning-to-drive-in-5-minutes

Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4
MIT License
287 stars 85 forks source link

self.obs doesn't change #34

Closed eliork closed 3 years ago

eliork commented 3 years ago

https://github.com/araffin/learning-to-drive-in-5-minutes/blob/ccb27e66d593d6036fc1076dcec80f74a3f5e239/algos/custom_ppo2.py#L157

Hi, I am trying to recreate your approach in a custom environment I have built. I am trying to use your custom version of PPO2, but I realize that after this line self.observation doesn't change. is that an expected behaviour?

Thanks, appreciate your work

araffin commented 3 years ago

realize that after this line self.observation doesn't change. is that an expected behaviour?

why should it change? with VecEnv, the reset is automatic (cf doc).

eliork commented 3 years ago

shouldn't self.obs hold the current observation? isn't that the expected output from the self.env.step(clipped_actions) function?

araffin commented 3 years ago

shouldn't self.obs hold the current observation?

self.obs holds the new observation after stepping in the env.

eliork commented 3 years ago

I found the problem, I guess I configured the observation space wrong. I configured it as

self.observation_space = spaces.Box(low=np.finfo(np.float32).min,
                              high=np.finfo(np.float32).max,
                              shape=(1, self.z_size + self.n_commands * self.n_command_history),
                              **dtype=np.uint8)**

instead of having dtype=np.float32 Thank you for your help, closing this issue