Hey @cbfinn, the 'harley_refactor' PR is pretty far behind at this point, and I have some other new things that I wanted to add. This PR shouldn't really change anything functionally, and includes several things from that other pull request (which I'll close shortly):
1) Fixes issue with policy_sample_mode (no more keep_samples conflicting with it).
2) Restrict linesearch to the log space only, using bisection with a min/max_eta (we already have an explicit min_eta and implicit max_eta, and the bisection works pretty quickly without any of the stability issues from the old version).
3) Cleans up MDGPS quite a bit, making it faster and the code more legible.
4) Fixes the error in mjc_mdgps_example.
There are some larger architectural changes too, that clean up a lot of code:
1) The policy_prior/policy_prior_gmm classes do the fitting now, kind of like how dynamics_lr_prior does the fitting (this is a bit clunky still, but I like it more than having the full _update_policy_fit method)
2) The dynamics prior uses gauss_fit_joint_prior instead of doing the computation itself.
The only real functional change is that MDGPS no longer uses the classic/global step sizes. Now I'm just averaging predicted/actual_impr across the conditions and using a single step size for everything. This makes more sense for clustering experiments, and it's nice to not have to worry about another hyperparameter, which didn't seem to make much of a difference anyways (I still have a step_rule hyperparam, but it should normally be laplace, I'm just using the mc estimates for some debugging with reacher).
I haven't included any of the clustering stuff in here, nor single GMMs (not sure if everyone can use those anyways), and I'm going to base future clustering stuff on this branch.
Hey @cbfinn, the 'harley_refactor' PR is pretty far behind at this point, and I have some other new things that I wanted to add. This PR shouldn't really change anything functionally, and includes several things from that other pull request (which I'll close shortly):
1) Fixes issue with
policy_sample_mode
(no morekeep_samples
conflicting with it). 2) Restrict linesearch to the log space only, using bisection with a min/max_eta (we already have an explicit min_eta and implicit max_eta, and the bisection works pretty quickly without any of the stability issues from the old version). 3) Cleans up MDGPS quite a bit, making it faster and the code more legible. 4) Fixes the error inmjc_mdgps_example
.There are some larger architectural changes too, that clean up a lot of code: 1) The policy_prior/policy_prior_gmm classes do the fitting now, kind of like how dynamics_lr_prior does the fitting (this is a bit clunky still, but I like it more than having the full _update_policy_fit method) 2) The dynamics prior uses
gauss_fit_joint_prior
instead of doing the computation itself.The only real functional change is that MDGPS no longer uses the
classic/global
step sizes. Now I'm just averaging predicted/actual_impr across the conditions and using a single step size for everything. This makes more sense for clustering experiments, and it's nice to not have to worry about another hyperparameter, which didn't seem to make much of a difference anyways (I still have astep_rule
hyperparam, but it should normally belaplace
, I'm just using themc
estimates for some debugging with reacher).I haven't included any of the clustering stuff in here, nor single GMMs (not sure if everyone can use those anyways), and I'm going to base future clustering stuff on this branch.