alashworth / test-issue-import

0 stars 0 forks source link

bivariate normal cdf #156

Open alashworth opened 5 years ago

alashworth commented 5 years ago

Issue by bob-carpenter Friday Jul 14, 2017 at 15:41 GMT Originally opened as https://github.com/stan-dev/stan/issues/2356


Summary:

Add a function to compute the bivariate normal cdf.

Description:

For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute

bivar_normal_cdf(y, mu, Sigma)

  = INT_{u < y}  multinormal_pdf(u, mu, Sigma) d.u

where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2].

There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter.

Additional Info:

This paper is often cited for an algorithm:

Current Version:

v2.16.0

alashworth commented 5 years ago

Comment by bgoodri Friday Jul 14, 2017 at 16:25 GMT


If someone is going to implement the CDF, you might as well implement the PMF and other functions with scalar inputs rather than 2x2 matrices and 2-vectors.

On Fri, Jul 14, 2017 at 11:41 AM, Bob Carpenter notifications@github.com wrote:

Summary:

Add a function to compute the bivariate normal cdf. Description:

For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute

bivar_normal_cdf(y, mu, Sigma)

= INT_{u < y} multinormal_pdf(u, mu, Sigma) d.u

where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2].

There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter. Additional Info:

This paper is often cited for an algorithm:

  • Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755

Current Version:

v2.16.0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2356, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqq1fgvojoTBP3dz6Tq0Y6sQ4f3-4ks5sN4w1gaJpZM4OYaQb .

alashworth commented 5 years ago

Comment by bgoodri Monday Aug 14, 2017 at 02:59 GMT


This article seems to have reversed the ratios in the numerators for the calculation of a1 and a2. It also does not deal with the 0 / 0 case. Here is a correct Stan function for it

  real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }
alashworth commented 5 years ago

Comment by bgoodri Monday Aug 14, 2017 at 03:01 GMT


There is also https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2924071 , which might be better for the log-CDF.

alashworth commented 5 years ago

Comment by bgoodri Monday Aug 14, 2017 at 15:13 GMT


For completeness, I perhaps should have added that if rho = 1, then the bivariate CDF is min(Phi(z1), Phi(z2)) and if rho = -1, it is Phi(z1) + Phi(z2) - 1.

alashworth commented 5 years ago

Comment by khakieconomics Tuesday Sep 26, 2017 at 17:46 GMT


This is extremely handy. Thanks @bgoodri

alashworth commented 5 years ago

Comment by rtrangucci Wednesday Aug 22, 2018 at 14:01 GMT


@bob-carpenter Per Ben's comment about doing scalar inputs rather than 2-vectors:

If we're set on 2-vectors, I'd like to expose the std_binormal_integral(real y1, real y2, real rho) as a function that takes scalars. The usage pattern for the std_binormal_lcdf function really doesn't lend itself to a vector input, but rather two scalar inputs. We'll usually be using the std_binormal_lcdf in something like a bivariate probit, where [y1, y2]' is a vector of vars. So the Stan code would need to do something like:

data {
  int N;
  int K1;
  int K2;
  matrix[N, K1] X1;
  matrix[N, K1] X2;
}
parameters {
  real<lower=-1, upper=1> rho;
  vector[K1] beta1;
  vector[K2] beta2;
}
model {
  vector[N] mu1 = X1 * beta1;
  vector[N] mu2 = X2 * beta2;
  for (n in 1:N)
    target += std_binormal_lcdf([mu1[n], mu2[n]]' | rho);
  ...

where we lose the advantage of the vectorization. In order to get the benefit of vectorization (i.e. caching the rho value that is repeated for every observation), we'd need to instead code the model like:

model {
...
  vector[2] model_mu[N];
  for (n in 1:N) {
    model_mu[n] = [mu1[n], mu2[n]]';
  }
  target += std_binormal_lcdf(model_mu | rho);
  ...

This looks funny to me, and seems inefficient, because we won't be respecting the memory locality in mu1 and mu2 and we need an extra loop. Here's the way it would work with the way I've coded it now:

model {
...
  target += std_binormal_lcdf(mu1 | mu2, rho);
  ...

That also looks strange to me given the placement of the conditioning bar. Maybe allowing for matrix inputs would make sense? something like

model {
  matrix[N, 2] mu;
  mu[,1] = X1 * beta1;
  mu[,2] = X2 * beta2;
  target += std_binormal_lcdf(mu | rho);
  ...

That I could get behind, and it would perhaps speed up internal computations.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Aug 22, 2018 at 14:26 GMT


Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.

Even understanding the problem, I'm still opposed to violating statistical convention with something like this:

std_binormal_lcdf(mu1 | mu2, rho);

I'd be happier with something like this:

std_binormal_lcdf({ mu1, mu2 } | rho);

but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.

We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:

std_binormal_lcdf(append_col(mu1, mu2) | rho);

or it's transpose append_row(mu1', mu2').

I could live with this:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.

Or I want to generalize our input syntax to allow

std_binormal_lcdf(mu1, mu2 | rho);

where both mu1 and mu2 are containers. But that's a ton of work.

alashworth commented 5 years ago

Comment by bgoodri Wednesday Aug 22, 2018 at 14:43 GMT


matrix with 2 columns?

On Wed, Aug 22, 2018 at 10:26 AM Bob Carpenter notifications@github.com wrote:

Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.

Even understanding the problem, I'm still opposed to violating statistical convention with something like this:

std_binormal_lcdf(mu1 | mu2, rho);

I'd be happier with something like this:

std_binormal_lcdf({ mu1, mu2 } | rho);

but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.

We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:

std_binormal_lcdf(append_col(mu1, mu2) | rho);

or it's transpose append_row(mu1', mu2').

I could live with this:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.

Or I want to generalize our input syntax to allow

std_binormal_lcdf(mu1, mu2 | rho);

where both mu1 and mu2 are containers. But that's a ton of work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2356#issuecomment-415050860, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqqeJxccAOgVVsAMZ66h85-HtPJh8ks5uTWoUgaJpZM4OYaQb .

alashworth commented 5 years ago

Comment by rtrangucci Wednesday Aug 22, 2018 at 15:02 GMT


Yeah, I agree with you that generalizing the syntax doesn't make sense for this case. In the case we've settled on:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

are you proposing we add a function to_array_of_2vectors(reals y1, reals y2) to the language? Won't that be the same internally as the Stan code:

vector[2] mu[N];
for (n in 1:N)
  mu[n] = [mu1[n], mu2[n]]';

?

If so, that's fine. I'll move forward with updating the lcdf and lpdf to accept 2-vectors as the argument and we can open a separate issue for to_array_of_2vectors.

alashworth commented 5 years ago

Comment by rtrangucci Wednesday Aug 22, 2018 at 15:20 GMT


One further point, then I'll go work on the function. All of the math internal to the std_binormal_lpdf and std_binormal_lcdf function (the log-density/probability and the gradients) operates on scalars) so we'll be converting to 2-vectors only to convert to scalars internally.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Aug 22, 2018 at 15:29 GMT


Yes. I'm proposing adding:

vector[] to_array_of_2vectors(vector x, vector y);

where the two vectors have the same size and

to_array_of_2vectors(x, y)[i, 1] == x[i]
to_array_of_2vectors(x, y)[i, 2] == y[i]

But it's not very elegant, to say the least.

I could be convinced that the matrix-with-two-columns approach is the way to go, as that'd just be

append_col(x, y) ~ ...

It's just that we decided that it'd be confusing to read vectorization out of rows or columns of matrices.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Aug 22, 2018 at 15:32 GMT


I'm only concerned about the external client interface, not how it's implemented.

But having said that, if it's all scalars under the hood, how will vectorization help?

I'd like to hear from @mitzimorris and @seantalts and @VMatthijs on this whole discussion.

alashworth commented 5 years ago

Comment by rtrangucci Wednesday Aug 22, 2018 at 15:50 GMT


Both std_binormal_lpdf and std_binormal_lcdf are implemented nearly identically to normal_lpdf: https://github.com/stan-dev/math/blob/feature/issue-962-bivar-norm/stan/math/prim/scal/prob/std_binormal_lcdf.hpp

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Aug 22, 2018 at 16:30 GMT


I take it you meant binormal and not normal in those first two!

alashworth commented 5 years ago

Comment by rtrangucci Wednesday Aug 22, 2018 at 20:37 GMT


Yes, you're right! Good catch, and I've edited the comment.

Upon a bit of reflection, it probably won't be much of a performance hit to just take in a list of 2-vectors given that we have to pull out. I'll use the vector_seq_view metaprogram instead of scalar_seq_view, and just pull the scalar elements out of each the N 2-vectors.