more general gamma function

alashworth / test-issue-import

0 stars 0 forks source link

more general gamma function #70

Open alashworth opened 5 years ago

alashworth commented 5 years ago

Issue by steven-murray Friday Jul 10, 2015 at 03:07 GMT Originally opened as https://github.com/stan-dev/stan/issues/1536

I am wanting to use the incomplete gamma function, with a negative shape parameter. This is currently not supported by the relevant boost library function.

To get around this, I have been trying to artificially increase the slope of my data (a lower-truncated generalised gamma distribution) by weighting each data-point by its value (effectively increases the slope by 1). This means that the Gamma call has a positive shape parameter and is therefore allowed, but introduces other problems (cf. the discussion on the mailing list about weighted log-probabilities).

It would be nice to have a more general incomplete gamma function (such a function is implemented for example in GSL, and in mpmath for python).

alashworth commented 5 years ago

Comment by syclik Friday Jul 10, 2015 at 04:01 GMT

@steven-murray, can you provide a more concrete description of the incomplete gamma function you're looking for? And, is this for use in the Stan language? Or just in C++.

alashworth commented 5 years ago

Comment by steven-murray Friday Jul 10, 2015 at 04:16 GMT

The generalised gamma distribution is defined here. As you can see, all three of its parameters are restricted to be positive. In particular, if $d$ here is negative, the distribution cannot converge towards low $x$.

Usually, the normalising factor, $\Gamma(d/p)$, is implemented only to take positive values of $d/p$ as well (for example the relevant function included in Stan, I'm guessing through the boost library, does this). However, the function can still be defined for negative values.

In my case, my distribution has a negative value of $d$, but this is okay since it is truncated at a lower boundary, which ensures that it can form a statistical pdf. In this case, the normalising factor becomes an incomplete gamma: $\Gamma(d/p,(x_{\rm min}/a)^p)$. This is perfectly valid to contain a positive $d$ parameter, but several libraries do not allow it (for pretty good practical reasons). Without this capability in the Gamma function, I cannot implement my model simply. It would be good to add at least this capability, if not the actual (truncated) generalised gamma distribution, to Stan.

I would like to use it in Stan -- I'm not sure exactly what the relationship between the Stan and C++ code is.

alashworth commented 5 years ago

Comment by steven-murray Friday Jul 10, 2015 at 04:20 GMT

Sorry about the math... apparently there is no support for latex in GFM

alashworth commented 5 years ago

Comment by bob-carpenter Friday Jul 10, 2015 at 17:20 GMT

@steven-murray We apparently have an incomplete beta hiding in our math lib, and the gradient for the incomplete gamma, so maybe it wouldn't be hard to do:

lib/stan_math_2.7.0/stan/math/prim/scal/fun/grad_reg_inc_gamma.hpp

The relation of C++ to Stan is that our functions are all implemented in C++ under the hood. And we have more people just using our C++ library.

@syclik We now have the issue of managing a feature request like this across multiple repos. It needs to go into the function signatures in repo stan-dev/stan and into the math library in repo stan-dev/math --- any suggestions?

@bgoodri The generalized gamma appears to be related to the kinds of things you were talking about in generalizing parametric families. Is this the kind of thing you'd be interested in having?

alashworth commented 5 years ago

Comment by bgoodri Friday Jul 10, 2015 at 17:51 GMT

Yes, although we might as well implement one of the generalizations of the generalized gamma distribution.

On Fri, Jul 10, 2015 at 1:20 PM, Bob Carpenter notifications@github.com wrote:

@steven-murray https://github.com/steven-murray We apparently have an incomplete beta hiding in our math lib, and the gradient for the incomplete gamma, so maybe it wouldn't be hard to do:

lib/stan_math_2.7.0/stan/math/prim/scal/fun/grad_reg_inc_gamma.hpp

The relation of C++ to Stan is that our functions are all implemented in C++ under the hood. And we have more people just using our C++ library.

@syclik https://github.com/syclik We now have the issue of managing a feature request like this across multiple repos. It needs to go into the function signatures in repo stan-dev/stan and into the math library in repo stan-dev/math --- any suggestions?

@bgoodri https://github.com/bgoodri The generalized gamma appears to be related to the kinds of things you were talking about in generalizing parametric families. Is this the kind of thing you'd be interested in having?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1536#issuecomment-120464365.

alashworth commented 5 years ago

Comment by steven-murray Monday Jul 13, 2015 at 03:11 GMT

@bob-carpenter The trick will be if the included incomplete gamma gradient has support for negative d.

alashworth commented 5 years ago

Comment by steven-murray Tuesday Jul 21, 2015 at 03:04 GMT

There is a workaround to this in some cases -- for small negative d, a recurrence relation can be used. See my own answer to the question at http://stats.stackexchange.com/questions/161134/weighted-log-probabilities-in-generalised-gamma-distribution/162390#162390

Might help?

alashworth commented 5 years ago

Comment by bgoodri Tuesday Jul 21, 2015 at 03:16 GMT

Perhaps. Do utilize the expose_stan_functions function in rstan 2.7.x (now on CRAN) to verify that your implementation agrees with GSL's via the RcppGSL or gsl packages.

alashworth commented 5 years ago

Comment by steven-murray Friday Jul 31, 2015 at 02:29 GMT

I've done a bit of work using my "fast" function, which only calculates the gamma function, gammainc(a,x), for a>-2. For reference, the function block is:

functions {
    /**
    * gammainc() is the upper incomplete gamma function (not regularized)
    * @param real a, shape parameter, must be >0
    * @param real x, position, >0
    * @returns the non-regularized incomplete gamma
    */
    real gammainc(real a, real x){
          return gamma_q(a,x) * tgamma(a);
    }

    /**
    * gammainc_neg() is the upper incomplete gamma function for a > -2
    * @param real a, shape parameter, must be >-2
    * @param real x, position, >0
    * @returns the non-regularized incomplete gamma
    *
    * NOTES: this routine has been written for speed for this application.
    *        More general functions for negative a can be gotten by recursion
    *        formulae etc.
    */
    real gammainc_neg(real a, real x){
        return ((gammainc(a+2,x)-pow(x,(a+1))*exp(-x))/(a+1)-exp(-x)*pow(x,a))/a;
    }
}

Comparing to GSL's version, I get the following two plots, which I hope will be fairly self-explanatory. I choose to run over values of a and x independently, for what are (in my situation) fairly normal values. All differences are better than 1e-5, with most better than 1e-12. There are a few features where perhaps the GSL version switches methods (not really sure).

with_a with_x

I can test a more general version (for lower a) later.

alashworth commented 5 years ago

Comment by bob-carpenter Friday Jul 31, 2015 at 03:15 GMT

Neat. Are you going to eventually need the log of that? If so, you can probably make the functions much more stable by working directly on the log scale.

Also, it'd be more efficient to rewrite the second function as follows to avoid recomputation:

real gammainc_neg(real a, real x){ real e_neg_x; real neg_pow_x_a; e_neg_x <- exp(-x); neg_pow_x_a <- -pow(x, a);

return (gammainc(a + 2, x) + x * neg_pow_x_a * exp_neg_x) / (a + 1)
       + e_neg_x * neg_pow_x_a / a;

}

Basically, anything that cuts down on redundant calculations or replaces division with multiplication or negation with addition is a win. Of course, you should check my algebra. But it looks like you're well down the testing path (which is awesome --- thanks for the plots).

Boost has an incomplete gamma function implementation:

http://www.boost.org/doc/libs/1_58_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html

The doc seems to be broken, but maybe it's my browser. But it does indicate different approximations are used for different parts of the domain.

Also, is there an easy way to compute derivatives? We could just chain rule it all through gamma_q and tgamma (or on the log scale if that's the real target).

On Jul 30, 2015, at 10:29 PM, Steven Murray notifications@github.com wrote:

I've done a bit of work using my "fast" function, which only calculates the gamma function, gammainc(a,x), for a>-2. For reference, the function block is:

functions { /* * gammainc() is the upper incomplete gamma function (not regularized) * @param real a, shape parameter, must be >0 * @param real x, position, >0 * @returns the non-regularized incomplete gamma / real gammainc(real a, real x){ return gamma_q(a,x) * tgamma(a); }
/**
* gammainc_neg() is the upper incomplete gamma function for a > -2
* @param real a, shape parameter, must be >-2
* @param real x, position, >0
* @returns the non-regularized incomplete gamma
*
* NOTES: this routine has been written for speed for this application.
*        More general functions for negative a can be gotten by recursion
*        formulae etc.
*/
real gammainc_neg(real a, real x){
    return ((gammainc(a+2,x)-pow(x,(a+1))*exp(-x))/(a+1)-exp(-x)*pow(x,a))/a;
}
Comparing to GSL's version, I get the following two plots, which I hope will be fairly self-explanatory. I choose to run over values of a and x independently, for what are (in my situation) fairly normal values. All differences are better than 1e-5, with most better than 1e-12. There are a few features where perhaps the GSL version switches methods (not really sure).

I can test a more general version (for lower a) later.

— Reply to this email directly or view it on GitHub.

alashworth commented 5 years ago

Comment by steven-murray Friday Jul 31, 2015 at 10:32 GMT

Okay, so I've re-implemented in a more general manner, so that it works for any negative a. I also took your suggestions into account, so here is the code:

    real gammainc(real a, real x){
          return gamma_q(a,x) * tgamma(a);
    }

    real gammainc_neg_gen(real a, real x){
      int n;
      real ap1;
      real ssum;

      if(a>=0) return gammainc(a,x);

      ap1 <- a+1;

      //Get floor(-a)
      n<-0;
      while(n<-a){
        n <- n+1;
      }

      //Get summed part
      {
        vector[n] sums;
        for(i in 0:n-1) sums[i+1] <- pow(x,i)/tgamma(ap1+i);
        ssum <- sum(sums);
      }
      return tgamma(a)*(gamma_q(a+n,x)-pow(x,a)*exp(-x)*ssum);
    }

I'm not sure this is as fast as it can be -- I had to use a while loop to get an integer ceil(-a). Any suggestions there would be good.

In terms of accuracy, cf. the following two plots (similar to the other ones, but a much larger range of a): with_a with_x

The only real problem seems to be with large negative a and large x. Of course, because of the sum, the efficiency also drops for larger negative a.

For my application I do need the log of this. However, I can't see any real way to simplify things with logs, because it's a sum. Any ideas here would be good as well.

The derivative can be found, but it's in terms of digamma's and Meijer-G functions (how do you even do the derivative of say gamma_q in Stan, since you don't have the Meijer-G function?).

alashworth commented 5 years ago

Comment by bob-carpenter Friday Jul 31, 2015 at 20:37 GMT

I'm afraid there's no god way to do integer ceilings. We've debated it in the past, but haven't added it as a built-in because it breaks differentiability of models in general (though not in every use case, hence the debate).

I think it might be possible to write a binary search that would cut the expected number of operations for the floor down to log(n).

On the log scale, sums on the linear scale turn into log_sum_exp operations.

log(a * b) = log(a) + log(b)

log(a + b) = log_sum_exp(log(a), log(b))

It can be much more arithmetically stable and also provide better precision in some cases.

On Jul 31, 2015, at 6:32 AM, Steven Murray notifications@github.com wrote:

Okay, so I've re-implemented in a more general manner, so that it works for any negative a. I also took your suggestions into account, so here is the code:
real gammainc(real a, real x){
      return gamma_q(a,x) * tgamma(a);
}

real gammainc_neg_gen(real a, real x){
  int n;
  real ap1;
  real ssum;

  if(a>=0) return gammainc(a,x);

  ap1 <- a+1;

  //Get floor(-a)
  n<-0;
  while(n<-a){
    n <- n+1;
  }

  //Get summed part
  {
    vector[n] sums;
    for(i in 0:n-1) sums[i+1] <- pow(x,i)/tgamma(ap1+i);
    ssum <- sum(sums);
  }
  return tgamma(a)*(gamma_q(a+n,x)-pow(x,a)*exp(-x)*ssum);
}
I'm not sure this is as fast as it can be -- I had to use a while loop to get an integer ceil(-a). Any suggestions there would be good.

In terms of accuracy, cf. the following two plots (similar to the other ones, but a much larger range of a):

The only real problem seems to be with large negative a and large x. Of course, because of the sum, the efficiency also drops for larger negative a.

For my application I do need the log of this. However, I can't see any real way to simplify things with logs, because it's a sum. Any ideas here would be good as well.

The derivative can be found, but it's in terms of digamma's and Meijer-G functions (how do you even do the derivative of say gamma_q in Stan, since you don't have the Meijer-G function?).

— Reply to this email directly or view it on GitHub.

alashworth commented 5 years ago

Comment by steven-murray Saturday Aug 01, 2015 at 02:25 GMT

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

The other thing I thought we could do is pre-expand, so it is Gamma(a,x) - x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

alashworth commented 5 years ago

Comment by bob-carpenter Saturday Aug 01, 2015 at 19:27 GMT

On Jul 31, 2015, at 10:25 PM, Steven Murray notifications@github.com wrote:

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

No problem. If it weren't for time constraints, Stan would have a lot more built-ins!

The thing most high-performance search and sorting algorithms do is use different algorithms depending on size.

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

Conditionals, as in "if (cond) block1 else block2" are always better than if_else in terms of performance. The problem with if_else() is that it evaluates both branches eagerly. What we really need is an equivalent of the ternary operator in C (C++, Java, etc.)

The other thing I thought we could do is pre-expand, so it is Gamma(a,x) - x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

No need to optimize if what you have works. Always always get things working, then optimize if you need more speed. We've been doing this in Stan all along. We started just autodiffing through the matrix libraries, Boost functions, ODE solver, etc. And now we've been moving more and more to custom derivatives.

alashworth commented 5 years ago

Comment by steven-murray Wednesday Aug 05, 2015 at 02:54 GMT

Cool, I'll keep it as-is for now. By the way, is there any way for the user to specify derivatives/jacobians? I would have thought that if there was a way, it would open things up a bit, and possibly make some code faster.

On Sun, Aug 2, 2015 at 3:28 AM, Bob Carpenter notifications@github.com wrote:

On Jul 31, 2015, at 10:25 PM, Steven Murray notifications@github.com wrote:

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

No problem. If it weren't for time constraints, Stan would have a lot more built-ins!

The thing most high-performance search and sorting algorithms do is use different algorithms depending on size.

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

Conditionals, as in "if (cond) block1 else block2" are always better than if_else in terms of performance. The problem with if_else() is that it evaluates both branches eagerly. What we really need is an equivalent of the ternary operator in C (C++, Java, etc.)

The other thing I thought we could do is pre-expand, so it is Gamma(a,x)

x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

No need to optimize if what you have works. Always always get things working, then optimize if you need more speed. We've been doing this in Stan all along. We started just autodiffing through the matrix libraries, Boost functions, ODE solver, etc. And now we've been moving more and more to custom derivatives.

Bob

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1536#issuecomment-126947886.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Aug 05, 2015 at 05:45 GMT

On Aug 4, 2015, at 10:54 PM, Steven Murray notifications@github.com wrote:

Cool, I'll keep it as-is for now. By the way, is there any way for the user to specify derivatives/jacobians? I would have thought that if there was a way, it would open things up a bit, and possibly make some code faster.

It's on our long-term to-do list, but as of now, there's no good way to do it. In fact, we don't even have a syntax for it developed. The main issue is that in user-defined functions, the code is generated so that any of the variables can be parameters or data, and all the derivatives work out. It's fairly tricky to do this efficiently with user-specified derivatives without making users write 2^N derivatives for N-input functions. We've been thinking about various kinds of functors that would do the right thing one dimension at a time, but then that blocks code sharing --- often derivatives involve repeated subexpressions.

We're happy to entertain syntax suggestions!

We'd also need to go up to second- and third-order derivatives, or we'd have to be able to differentiate through user-supplied derivatives. We can probably figure out how to do the templating for that along the same lines as for the existing functions.

On Sun, Aug 2, 2015 at 3:28 AM, Bob Carpenter notifications@github.com wrote:

On Jul 31, 2015, at 10:25 PM, Steven Murray notifications@github.com wrote:

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

No problem. If it weren't for time constraints, Stan would have a lot more built-ins!

The thing most high-performance search and sorting algorithms do is use different algorithms depending on size.

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

Conditionals, as in "if (cond) block1 else block2" are always better than if_else in terms of performance. The problem with if_else() is that it evaluates both branches eagerly. What we really need is an equivalent of the ternary operator in C (C++, Java, etc.)

The other thing I thought we could do is pre-expand, so it is Gamma(a,x)

x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

No need to optimize if what you have works. Always always get things working, then optimize if you need more speed. We've been doing this in Stan all along. We started just autodiffing through the matrix libraries, Boost functions, ODE solver, etc. And now we've been moving more and more to custom derivatives.

Bob

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1536#issuecomment-126947886.

— Reply to this email directly or view it on GitHub.

alashworth commented 5 years ago

Comment by stemangiola Friday May 12, 2017 at 23:56 GMT

I found the generalised gamma function in BUGS

https://books.google.com.au/books?id=zG7SBQAAQBAJ&pg=PA347&lpg=PA347&dq=%22dggamma%22&source=bl&ots=QvmDksRRQ7&sig=p7Uwr6V7kf3lCy4ncwPM3it3d3k&hl=en&sa=X&ved=0ahUKEwjl-4SMtuvTAhWLfrwKHevKDfkQ6AEIODAE#v=onepage&q=%22dggamma%22&f=false

However is i try to implement in stan

functions{ real ggamma_lpdf(real x, real a, real b, real c) { return(log(cb^(ca) x^(ca - 1) exp(-(bx)^c) / tgamma(a))); } } or

functions{ real ggamma_lpdf(real x, real a, real b, real c) { return( (ca (log (c) + log(b)) ) + ((ca - 1) log (x) ) + (-(b*x)^c ) - log( tgamma(a) )); } }

I have no success in test prediction

alashworth commented 5 years ago

Comment by sakrejda Saturday May 13, 2017 at 00:07 GMT

@stemangiola This sounds like a question for the mailing list (http://discourse.mc-stan.org/), or a separate issue. If you think there is a bug please try to post a reproducible example. If you want a feature please start a separate issue. The link you give won't open for me but I have the generalized gamma implemented in a package if that's what you need. There are a lot of varieties of it and some work better than others in Stan.

alashworth commented 5 years ago

Comment by stemangiola Saturday May 13, 2017 at 00:44 GMT

Thanks a lot,

sorry for the post here.

Here is my question for the community.

Your implementations for stan as function{} would be much appreciated!