Expose gradient - Githubissues

alashworth commented 5 years ago

Issue by bgoodri Friday Mar 14, 2014 at 16:12 GMT Originally opened as https://github.com/stan-dev/stan/issues/605

One thing that we may be able to do within weeks rather than months that would allow Andrew to do more of this in Stan proper is to expose a score function (which might be somewhat useful for other purposes too), a.k.a. the gradient of the log-likelihood

http://en.wikipedia.org/wiki/Score_%28statistics%29

Thus, the estimated variance-covariance matrix of the estimates could be obtained via the Outer Product of Gradients method, see equation (14-18) in

http://pages.stern.nyu.edu/~wgreene/NUI2011/Greene-Chapter-14.pdf

That way Andrew could write a generated quantities block like this

generated quantities {
  matrix[K,K] H;
  vector[K] draws[SIMS];
  vector nald[SIMS]
  vector pll[SIMS]

  H <- rep_matrix(0.0,K,K)
  for (i in 1:N) H <- H + tcrossprod(score(foo_log(y[i], ...));
  H <- H / N;
  V <- inverse_spd(H);
  for (i in 1:SIMS) {
    draws[i] <- normal_rng(theta, V);
    nald[i] <- normal_log(draws[i], theta, V);
    pll[i] <- /* rebake penalized log-likelihood evaluated at draws[i] */
  }
}

alashworth commented 5 years ago

Comment by bob-carpenter Friday Mar 14, 2014 at 16:36 GMT

If I understood correctly, Ben commented on the mailing list that score() is the gradient of the log likelihood.

I have no idea how we could implement that in Stan as it stands now. Once we add functions, it should be much more practical to do this at least for double inputs in the generated quantities block. but if we need to roll it into the model itself, it'd require higher-order autodiff as Marcus pointed out on the list and would be hugely complicted to implement.

From the outside of Stan's modleing language, it's easy to get the gradient w.r.t. the transformed (unconstrained) parameters, but it's derivatives of the whole log probability, not just the likelihood. So it may require writing a second model just for the log likelihood.

alashworth commented 5 years ago

Comment by bgoodri Friday Mar 14, 2014 at 16:41 GMT

Well, Andrew was trying to use this in a ML context, so if we can only get it working for that first, it would be okay. The generated quantities block would have to do the log-likelihood for each observation (like WAIC) anyway, because you need to sum the outer product of the score vectors for each observation.

On Fri, Mar 14, 2014 at 12:36 PM, Bob Carpenter notifications@github.comwrote:

If I understood correctly, Ben commented on the mailing list that score()is the gradient of the log likelihood.

I have no idea how we could implement that in Stan as it stands now. Once we add functions, it should be much more practical to do this at least for double inputs in the generated quantities block. but if we need to roll it into the model itself, it'd require higher-order autodiff as Marcus pointed out on the list and would be hugely complicted to implement.

From the outside of Stan's modleing language, it's easy to get the gradient w.r.t. the transformed (unconstrained) parameters, but it's derivatives of the whole log probability, not just the likelihood. So it may require writing a second model just for the log likelihood.

Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/605#issuecomment-37668229 .

alashworth commented 5 years ago

Comment by andrewgelman Friday Mar 14, 2014 at 17:27 GMT

Hi, thanks for thinking of me! For Waic and cross-validation we will want to evaluate individual terms of the log likelihood but for mle etc., we just need to evaluate the entire log joint density, I don’t think we’ll need to be doing it term by term. A

On Mar 14, 2014, at 5:41 PM, bgoodri notifications@github.com wrote:

Well, Andrew was trying to use this in a ML context, so if we can only get it working for that first, it would be okay. The generated quantities block would have to do the log-likelihood for each observation (like WAIC) anyway, because you need to sum the outer product of the score vectors for each observation.

On Fri, Mar 14, 2014 at 12:36 PM, Bob Carpenter notifications@github.comwrote:

If I understood correctly, Ben commented on the mailing list that score()is the gradient of the log likelihood.

I have no idea how we could implement that in Stan as it stands now. Once we add functions, it should be much more practical to do this at least for double inputs in the generated quantities block. but if we need to roll it into the model itself, it'd require higher-order autodiff as Marcus pointed out on the list and would be hugely complicted to implement.

From the outside of Stan's modleing language, it's easy to get the gradient w.r.t. the transformed (unconstrained) parameters, but it's derivatives of the whole log probability, not just the likelihood. So it may require writing a second model just for the log likelihood.

Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/605#issuecomment-37668229 .

— Reply to this email directly or view it on GitHub.

alashworth commented 5 years ago

Comment by bgoodri Friday Mar 14, 2014 at 17:30 GMT

You do need the score on an observation-by-observation basis to estimate the Hessian this way. Maybe there is some other way to expose the Hessian to the generated quantities block.

On Fri, Mar 14, 2014 at 1:27 PM, Andrew Gelman notifications@github.comwrote:

Hi, thanks for thinking of me! For Waic and cross-validation we will want to evaluate individual terms of the log likelihood but for mle etc., we just need to evaluate the entire log joint density, I don't think we'll need to be doing it term by term. A

On Mar 14, 2014, at 5:41 PM, bgoodri notifications@github.com wrote:

Well, Andrew was trying to use this in a ML context, so if we can only get it working for that first, it would be okay. The generated quantities block would have to do the log-likelihood for each observation (like WAIC) anyway, because you need to sum the outer product of the score vectors for each observation.

On Fri, Mar 14, 2014 at 12:36 PM, Bob Carpenter < notifications@github.com>wrote:

If I understood correctly, Ben commented on the mailing list that score()is the gradient of the log likelihood.

I have no idea how we could implement that in Stan as it stands now. Once we add functions, it should be much more practical to do this at least for double inputs in the generated quantities block. but if we need to roll it into the model itself, it'd require higher-order autodiff as Marcus pointed out on the list and would be hugely complicted to implement.

From the outside of Stan's modleing language, it's easy to get the gradient w.r.t. the transformed (unconstrained) parameters, but it's derivatives of the whole log probability, not just the likelihood. So it may require writing a second model just for the log likelihood.

Reply to this email directly or view it on GitHub< https://github.com/stan-dev/stan/issues/605#issuecomment-37668229> .

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/605#issuecomment-37673886 .

alashworth commented 5 years ago

Comment by ariddell Saturday Mar 05, 2016 at 19:57 GMT

This would be useful for PyStan. I'll reopen the issue. If there's a better place for this, I'd be happy to migrate it.

alashworth commented 5 years ago

Comment by syclik Wednesday Nov 30, 2016 at 15:23 GMT

@ariddell, I think this is a language issue.

@bgoodri, are you envisioning something like this only happening in generated quantities?

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 30, 2016 at 16:01 GMT

On Wed, Nov 30, 2016 at 10:23 AM, Daniel Lee notifications@github.com wrote:

@bgoodri https://github.com/bgoodri, are you envisioning something like this only happening in generated quantities?

No; the likelihood can be a function of partial derivatives.

alashworth commented 5 years ago

Comment by syclik Wednesday Nov 30, 2016 at 16:11 GMT

Hmm... that makes it much more of a pain, but I don't think it's impossible. We'd need to add the score() function to the math library that does the nested autodiff, right? Then expose that through the language?

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 30, 2016 at 16:12 GMT

On Wed, Nov 30, 2016 at 11:11 AM, Daniel Lee notifications@github.com wrote:

Hmm... that makes it much more of a pain, but I don't think it's impossible. We'd need to add the score() function to the math library that does the nested autodiff, right? Then expose that through the language?

Something like that

alashworth / test-issue-import

Expose gradient #34