Closed etienne87 closed 7 years ago
undoing the " Environment.preprocess_karpathy_pong(frame)" is simple enough, but things like
GA3C/ga3c/Environment.py", line 77, in get_num_actions return len(self.game.env._action_set) AttributeError: 'TimeLimit' object has no attribute '_action_set'
seem to indicate that @etienne87 has made additional changes that require a modified openai-universe.
thanks @nczempin, will take care of this asap! i'm gonna pull current version & resolve conflicts. I need to replace len(self.game.env._action_set)
by return self.game.env.action_space.n
well, the conflicts are not an issue for me for now, because I can just work with the branch.
Are you saying that the karpathy_pong thing and this one line 77 are the only two things I need to change, so I basically gave up too early?
okay, I made those two changes (rolling back the karpathy_pong thing and making the suggested change on line 77) and it is working on the branch.
@etienne87, in networkVP.py log function is missing the newly added args.
def log(self, x, y_r, a, adv):
feed_dict.update({self.y_r: y_r, self.action_index: a, self.advantages : adv})
why delete? Is the code wrong?
@tangbohu the code was not giving much better results...sorry about that; if you want any guidance i should be able to send you some old code.
Gae branch for "Generalized Advantage Estimation". Advantage is either R-V (normal), or GAE (using temporal difference in the advantage) Gae branch fixes logits_p having relu func Gae branch adds a "Config.Zoo" for personal folder for other training configs like "CartPole-v0" with no convnet for fast regression testing (under 1 minute) or "KarpathyPong" which makes a smart preprocessing for fast training (under 20 minutes).