cbfinn / gps

Guided Policy Search
http://rll.berkeley.edu/gps/
Other
594 stars 239 forks source link

MDGPS #29

Closed wmontgomery4 closed 8 years ago

wmontgomery4 commented 8 years ago

Sorry for the delay, here's a preliminary pull request for the mdgps code. It isn't in a state to be merged just yet, but I wanted to let you take a look at the main outline. The standing issues right now are: 1) Some of the changes are pretty hacky (particularly to get on-policy sampling working), and should be made more sane before merging. This also includes adding a separate traj_opt file for MDGPS, and the way that I use a 'pol_info' object as a 'traj_distr' object sometimes. 2) I didn't change any of the tf policy_opt code, but did make changes to the caffe policy_opt code, so things probably aren't compatible between the two right now 3) On-policy sampling isn't working that well with just 4 conditions on the peg insertion task (we used 9 in the paper), so I need to play around with that and see if it's just a hyperparameter issue. 4) There are also some minor tweaks to the gui and some fixed TODOs from non-mdgps stuff, which I should probably just open a new branch for

I'm working on the issues above, but feel free to take a look over things in the meantime.

cbfinn commented 8 years ago

Thanks Harley! I'll take a look at it now, and go ahead and repost/comment when you make changes that are ready to be looked at.

wmontgomery4 commented 8 years ago

Okay, this is in a state to be looked at again. Couple notes:

cbfinn commented 8 years ago

Generally looks good! Left some fairly minor comments.

I'm think it would be okay to merge before refactoring with BADMM, and leave the refactor for the next PR, since this one is already rather large.

wmontgomery4 commented 8 years ago

Okay, I've addressed the comments and switched the hyperparams file over to using tensorflow. I need to tweak the hyperparams a bit, but everything should be essentially working.

One note, I changed the kl_div_i/f outputs to show the average kl_div rather than the sum, since this is what the step column corresponds to already. If you'd prefer the sum let me know and I'll change it back.

Also, I had a bizarre bug where tensorflow would seg fault unless I continued to import policy_opt_caffe.py (even though it was unused). I fixed it by changing the order of imports in policy_opt_tf.py, but I'm still not sure what the issue was (it might just have been a local issue).

cbfinn commented 8 years ago

Ok, mean kl is fine. That's a rather weird tf bug.

I'll add you as a collaborator so you can merge whenever you're ready.