cbfinn / gps

Guided Policy Search
http://rll.berkeley.edu/gps/
Other
597 stars 239 forks source link

Wrong normalization in GMM-sigma initialization? #20

Open jviereck opened 8 years ago

jviereck commented 8 years ago

Hi there,

reading the code at https://github.com/cbfinn/gps/blob/master/python/gps/utility/gmm.py#L167

            # Initialize.
            for i in range(K):
                cluster_idx = (cidx == i)[0]
                mu = np.mean(data[cluster_idx, :], axis=0)
                diff = (data[cluster_idx, :] - mu).T
                sigma = (1.0 / K) * (diff.dot(diff.T))  # <<<<<<<<<< THIS LINE
                self.mu[i, :] = mu
                self.sigma[i, :, :] = sigma + np.eye(Do) * 2e-6

and the documentation about GMM at 1 I am wondering if the sigma value should be normalized by the count of data points in the cluster i instead of the total number of clusters k. That is:

sigma = (1.0 / cluster_idx.shape[0]) * (diff.dot(diff.T))  # Note the 'cluster_idx.shape[0]'

What do you think?

Best regards,

cbfinn commented 8 years ago

Hi Julian- I believe you're correct. Can you make a PR with the change?

jviereck commented 8 years ago

Hi Chelsea,

thanks for the quick reply! Sure, more than happy to make a PR.

Is there an automated way to run a unit test suite to avoid the PR adding new bugs? Or is there any other procedure you would like me to follow in preparation of the PR?

Best,

cbfinn commented 8 years ago

We unfortunately haven't had time to add a suite of unit tests. [Though there are a few tests for the gui and tensorflow https://github.com/cbfinn/gps/tree/master/python/tests.]

Assuming you'll only be making changes to the mujoco agent and interface, the best thing to do is run each of the mjc examples, and make sure they run and achieve a similar cost at the end of training compared to before the changes. If you don't have access to the older version of mujoco, then I can send screenshots of a training run of each of the examples with the current version.

Moving forward, we can put a log of the example runs in the docs/website (or ideally, write a test to make sure the cost of the policy is in a reasonable range at the end of the example, and that it doesn't error out).

Chelsea

On Wed, May 4, 2016 at 1:01 AM, Julian Viereck notifications@github.com wrote:

Hi Chelsea,

thanks for the quick reply! Sure, more than happy to make a PR.

Is there an automated way to run a unit test suite to avoid the PR adding new bugs? Or is there any other procedure you would like me to follow in preparation of the PR?

Best,

  • Julian

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/cbfinn/gps/issues/20#issuecomment-216772119

jviereck commented 8 years ago

Hi Chelsea,

thanks for your reply and sorry for getting back to it after a so long delay.

the best thing to do is run each of the mjc examples

I don't have access to mjc and short on time to run all the examples and compare them against screenshots manually.

Therefore, are you okay with a PR for the above change without running any prior tests or how do you think should we continue here?

Best, Julian

burak-cetinkaya commented 7 years ago

Hello,

I believe cluster_idx.shape[0] is not what we want here, because cluster_idx gives us a vector with 1/N (on average) of values True and the others False with a length of N ( Where N is number of total data points), so shape[0] gives us number of total data points for all clusters. I think we should normalize with data[cluster_idx,:].shape[0] which gives us the number of data point of length (dx+du+dx) that is used for i'th cluster.

Best Regards,

Burak