cbfinn / gps

Guided Policy Search
http://rll.berkeley.edu/gps/
Other
594 stars 239 forks source link

Slight refactor/bugfix #43

Closed wmontgomery4 closed 7 years ago

wmontgomery4 commented 7 years ago

Since it seems like the clustering stuff isn't there yet, I removed all of that stuff, but I think this refactor/update is still worthwhile. We can add the cluster stuff to this later once it's all been worked out.

This does not switch to a single GMM overall, but continues to use a GMM per condition. I think this is actually faster/more efficient, and wanted to put off any changes to this structure for a later refactor as it will require quite a few changes.

1) It fixes the issue with the policy_prior_gmm where policy_sample_mode had no effect because keep_samples was set true. Before we were using max_samples = 20, which meant that we were effectively holding onto 4 iterations of samples (when N=5), and running in pure replace mode actually seems to improve performance as well as speed. 2) It restricts the linesearch to only using the non-log space, which seems to prevent some errors. 3) It cleans up MDGPS quite a bit, making it faster and the code more legible. 4) Fixes the error in mjc_mdgps_example.

@cbfinn feel free to take a look when you get a chance. There are a few more things I want to address though, but they could be put off for a later update: 1) I could make a base AlgorithmGPS class which handles several of the methods that are copied between BADMM/MDGPS. However, MDGPS might change when we add the clustering stuff, so I'm not sure if it's worth the code savings right now. 2) There are some TODOs/questions about the agent_mjc stuff, which I want to look into further (and I also want to switch back to only one viewer if possible). 3) The linesearch seems a bit slower than necessary. I think this could be improved by running golden section search in log-space (golden section search just needs a unimodal function, not necessarily concave).

cbfinn commented 7 years ago

Thanks Harley! I'll take a look at this on Sunday. I agree about changing back to a single mujoco viewer. The line search also seems worth speeding up if it's fairly easy to fix. Making an AlgorithmGPS seems lower priority to me at the moment, but feel free to code it if you think that it would be beneficial and make future research code development faster/easier.