Closed eliork closed 3 years ago
Hello,
this will result in an error in this line as numpy can't reshape an array of size greater than 1*2048 to size 1,2048.
this is only for logging, so not a real issue.
However, the GAE computation is probably wrong. This code was an attempt to have an episodic PPO (which can be applied on a robot), but was only partially tested... Anyway, unless you want to learn end-to-end, I would recommend to use AE + SAC, and if you want to use PPO, the Stable-Baselines3 version should be easier to tweak ;)
Hey, thanks for your answer.
After reading the code again I understood it's not a real issue, though it does create a runtime error, from time to time, and kill the training proces so in order to avoid that I slice the true_reward
and the masks
lists to the size of self.n_steps
like this
if writer is not None:
self.episode_reward = total_episode_reward_logger(self.episode_reward,
true_reward[:self.n_steps].reshape((self.n_envs, self.n_steps)),
masks[:self.n_steps].reshape((self.n_envs, self.n_steps)),
writer, n_timesteps)
I am trying to relax your approach from the continuous action space to the discrete action space, hence I am trying to use PPO algorithm. I trained a VAE with my own set of images, with z_size of 512. I don't understand why the GAE computation is wrong, it looks identical to the one here could you please elaborate a little bit? Thank you
After reading the code again I understood it's not a real issue, though it does create a runtime error, from time to time, and kill the training proces so in order to avoid that I slice the true_reward and the masks lists to the size of self.n_steps like this
yes, that's one solution, or you can comment it too.
I don't understand why the GAE computation is wrong, it looks identical to the one here
for step in reversed(range(self.n_steps)):
should be replaced with the true number of steps (len(mb_rewards) for instance).
and the rest of the code may need to be adapted.
A quick solution would be to truncate mb_obs
, mb_rewards
, ... to n_steps.
I am trying to relax your approach from the continuous action space to the discrete action space, hence I am trying to use PPO algorithm.
you should maybe take a look at DQN / QR-DQN then (cf Stable-Baselines3 / SB3 contrib).
(cf Stable-Baselines3 / SB3 contrib).
Thank you very much for your answers. excuse me for being a little bit new here :-) but what does that mean? what is cf? Thanks
excuse me for being a little bit new here :-) but what does that mean? what is cf?
sorry, I meant take a look at https://github.com/DLR-RM/stable-baselines3 (for DQN) and https://github.com/Stable-Baselines-Team/stable-baselines3-contrib (for QR-DQN) I should have written cf. (which stands for "confer", a latin word to refer to something for more information ;))
Thank you very much for your help! keep up the good work!
https://github.com/araffin/learning-to-drive-in-5-minutes/blob/ccb27e66d593d6036fc1076dcec80f74a3f5e239/algos/custom_ppo2.py#L165
Hi, I think the logic of updating mb_rewards is wrong here. for example, say we run 1 environment for 2048 steps, since mb_rewards.append(rewards) will append all the rewards, we might have a list with size > 2048, then, there is a check if the len of mb_rewards is greater than 2048 and then the loop breaks, but we are still with a len of mb_reward that is greater than 2048. this will result in an error in this line as numpy can't reshape an array of size greater than 1*2048 to size 1,2048.