NVlabs / GA3C

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
BSD 3-Clause "New" or "Revised" License
652 stars 195 forks source link

Gae #18

Closed etienne87 closed 7 years ago

etienne87 commented 7 years ago

Gae branch for "Generalized Advantage Estimation". Advantage is either R-V (normal), or GAE (using temporal difference in the advantage) Gae branch fixes logits_p having relu func Gae branch adds a "Config.Zoo" for personal folder for other training configs like "CartPole-v0" with no convnet for fast regression testing (under 1 minute) or "KarpathyPong" which makes a smart preprocessing for fast training (under 20 minutes).

nczempin commented 7 years ago

undoing the " Environment.preprocess_karpathy_pong(frame)" is simple enough, but things like GA3C/ga3c/Environment.py", line 77, in get_num_actions return len(self.game.env._action_set) AttributeError: 'TimeLimit' object has no attribute '_action_set'

seem to indicate that @etienne87 has made additional changes that require a modified openai-universe.

etienne87 commented 7 years ago

thanks @nczempin, will take care of this asap! i'm gonna pull current version & resolve conflicts. I need to replace len(self.game.env._action_set) by return self.game.env.action_space.n

nczempin commented 7 years ago

well, the conflicts are not an issue for me for now, because I can just work with the branch.

Are you saying that the karpathy_pong thing and this one line 77 are the only two things I need to change, so I basically gave up too early?

nczempin commented 7 years ago

okay, I made those two changes (rolling back the karpathy_pong thing and making the suggested change on line 77) and it is working on the branch.

ieow commented 7 years ago

@etienne87, in networkVP.py log function is missing the newly added args.

def log(self, x, y_r, a, adv):
    feed_dict.update({self.y_r: y_r, self.action_index: a, self.advantages : adv})
tangbohu commented 6 years ago

why delete? Is the code wrong?

etienne87 commented 6 years ago

@tangbohu the code was not giving much better results...sorry about that; if you want any guidance i should be able to send you some old code.