MDGPS - Githubissues

wmontgomery4 commented 8 years ago

Sorry for the delay, here's a preliminary pull request for the mdgps code. It isn't in a state to be merged just yet, but I wanted to let you take a look at the main outline. The standing issues right now are: 1) Some of the changes are pretty hacky (particularly to get on-policy sampling working), and should be made more sane before merging. This also includes adding a separate traj_opt file for MDGPS, and the way that I use a 'pol_info' object as a 'traj_distr' object sometimes. 2) I didn't change any of the tf policy_opt code, but did make changes to the caffe policy_opt code, so things probably aren't compatible between the two right now 3) On-policy sampling isn't working that well with just 4 conditions on the peg insertion task (we used 9 in the paper), so I need to play around with that and see if it's just a hyperparameter issue. 4) There are also some minor tweaks to the gui and some fixed TODOs from non-mdgps stuff, which I should probably just open a new branch for

I'm working on the issues above, but feel free to take a look over things in the meantime.

cbfinn commented 8 years ago

Thanks Harley! I'll take a look at it now, and go ahead and repost/comment when you make changes that are ready to be looked at.

wmontgomery4 commented 8 years ago

Okay, this is in a state to be looked at again. Couple notes:

Works with tf as well, but it's slower and doesn't seem to work quite as well (although this could easily be a fluke, or because the hyperparams were tweaked with caffe). Tensorflow also takes up a lot of memory, so I can't run all 4 variations simultaneously, so I'm sticking with caffe for now, but it works with tensorflow and so that's not an issue.
I had to remove the default TrajOptLqrPython from the algorithm config.py to avoid circular imports (I need to import both AlgorithmBADMM and AlgorithmMDGPS in TrajOpt for some type checks since the traj opt varies slightly between them)
Still haven't refactored things with BADMM, but I agree that would be a good next step.

cbfinn commented 8 years ago

Generally looks good! Left some fairly minor comments.

I'm think it would be okay to merge before refactoring with BADMM, and leave the refactor for the next PR, since this one is already rather large.

wmontgomery4 commented 8 years ago

Okay, I've addressed the comments and switched the hyperparams file over to using tensorflow. I need to tweak the hyperparams a bit, but everything should be essentially working.

One note, I changed the kl_div_i/f outputs to show the average kl_div rather than the sum, since this is what the step column corresponds to already. If you'd prefer the sum let me know and I'll change it back.

Also, I had a bizarre bug where tensorflow would seg fault unless I continued to import policy_opt_caffe.py (even though it was unused). I fixed it by changing the order of imports in policy_opt_tf.py, but I'm still not sure what the issue was (it might just have been a local issue).

cbfinn commented 8 years ago

Ok, mean kl is fine. That's a rather weird tf bug.

I'll add you as a collaborator so you can merge whenever you're ready.

cbfinn / gps

MDGPS #29