Implicit uniform priors

maxbiostat commented 4 years ago

Hi @bob-carpenter , it's me again being a pain in the arse. In this and this programs, one or more of sens, spec and p (\pi in the paper) have implicit uniform priors. Two questions: (i) did you guys consider relaxing that assumption and allowing, say, beta priors? and (ii) if no, would you like to do that? If yes, I can work on a PR implementing these minor changes.

I'm also interested in the induced prior on p_sample and p (\pi) for the various model formulations (from simple to MRP). Basically a more thorough prior sensitivity/predictive analysis. We can talk about a PR with some extra analyses if you guys are interested.

bob-carpenter commented 4 years ago

it's me again being a pain in the arse.

I like feedback, so don't worry.

one or more of sens, spec and p (\pi in the paper) have implicit uniform priors.

That's equivalent to beta(1, 1). Or equivalent to a logistic(0, 1) on the log odds scale.

If we wanted to add priors beyond a uniform, which would be a good idea in my opinion since we have some preconception of prevalence, we'd probably do it on the log odds scale rather than with beta priors on the probability scale. It's much more flexible that way when you want to expand the model, say with predictors or a spatial model of prevalence or something else.

This repo was really set up around the paper we were writing. As such, we don't want changes to what's already there as the paper's under review and being modified for that.

If you want to contribute a new model and some new analyses, by all means go ahead and do that. Just set it up parallel to what we did in the specificity paper the way I set up my case study in parallel. And I'd suggest copying over what you need to avoid dependencies between the analyses (because as I hinted, ours are likely to change a bit over the next few days).

There are also missing priors on some of the hierarchical means, I believe. It'd be nice to know what the sensitivity to that assumption was, too. I believe @andrewgelman thought it would be negligible, but it'd be nice to show that with a simulation.

I'm also interested in the induced prior on p_sample

p_sample is a transformation of sensitivity and specificity, so putting a prior on that is tricky, because you need a parameter (inverse) transform (sens x spec -> p_sample x ???) and corresponding Jacobians.

maxbiostat commented 4 years ago

That's equivalent to beta(1, 1). Or equivalent to a logistic(0, 1) on the log odds scale.

I'm well aware, I'm sorry. I should've said "more general" beta priors.

If we wanted to add priors beyond a uniform, which would be a good idea in my opinion since we have some preconception of prevalence, we'd probably do it on the log odds scale rather than with beta priors on the probability scale. It's much more flexible that way when you want to expand the model, say with predictors or a spatial model of prevalence or something else.

I'm not sure I agree with that statement, but I will consider comparing Beta and logistic-normal priors for sens and spec.

This repo was really set up around the paper we were writing. As such, we don't want changes to what's already there as the paper's under review and being modified for that. If you want to contribute a new model and some new analyses, by all means go ahead and do that. Just set it up parallel to what we did in the specificity paper the way I set up my case study in parallel. And I'd suggest copying over what you need to avoid dependencies between the analyses (because as I hinted, ours are likely to change a bit over the next few days).

Cool. Got it. I'll start my own repo with a few analyses.

There are also missing priors on some of the hierarchical means, I believe. It'd be nice to know what the sensitivity to that assumption was, too. I believe @andrewgelman thought it would be negligible, but it'd be nice to show that with a simulation.

One more thing I shall consider in my simulation, then. Thanks for the tip.

I'm also interested in the induced prior on p_sample p_sample is a transformation of sensitivity and specificity, so putting a prior on that is tricky, because you need a parameter (inverse) transform (sens x spec -> p_sample x ???) and corresponding Jacobians.

I meant that I'm interested in the distribution on p_sample induced by the distributions on p, sens and spec. What the cool kids call the "pushforward measure". It's the same as prior predictive check, in a way. It might be nice to look at traditional PPCs in the example, also.

bob-carpenter commented 4 years ago

Hmm. I thought I replied to this, but looks like I never hit "Comment". Argh. It was a long reply. Let's start again. You don't need to start your own repo. You're more than welcome to contribute to this one. Just start a parallel file rather than modifying the one with our paper in it. Copy over files you need, etc., so there aren't dependencies. Thanks---we're just modifying the paper for a journal submission and don't want to mess with that part of the repo. I can invite you to the repo if you'd like to do it that way.

I was saying that the log odds scale is more flexible because

it's more flexible for simple hierarchical priors
- choice of cdf, e.g. log odds being the logistic cdf and probit being the normal cdf
- choice of shape (particularly dispersion) in the priors
- much easier to extend to multilevel models
- much easier to nest
it's easier to add varying effects (aka "random effects")
it's easier to add predictors (aka "fixed effects")

For beta priors, the conjugacy gives them a nice interpretation as data. But they're very challenging to nest or use in a regression setting. The prior beta(alpha, beta) needs to be reparameterized in terms of mean and total count (+2) of prior data, i.e., beta(kappa * theta, kappa * (1 - theta)) for kappa > 0 and theta in (0, 1). Leaving alpha and beta independent in a hierarchical prior can be problematic to fit and also interpret.

Evaluating the induced prior on p_sample would be great, especially w.r.t. changing priors on prevalence, sensitivity, and specificity. The result can also be used to do prior predictive checks, which we've been stressing more and more recently. In this case, with uniform priors on prevalence, sensitivity, and specificity, the prior distribution over p_sample is very similar to a beta(2, 2) only a bit higher variance.

> psamp <- function(prev, sens, spec) prev * sens + (1 - prev) * (1 - spec)
> p_sample <- psamp(runif(1e6), runif(1e6), runif(1e6))
> hist(p_sample)

p-sample-prior-distro

With beta(10, 2) priors on sensitivity and specificity (equivalently, 9 prior true result, 1 prior false result) and a beta(2, 50) prior on prevalence (mean 2%), it's still a pretty broad prior on marginal test response,

> psamp2 <- psamp(rbeta(1e6, 2, 50), rbeta(1e6, 10, 2), rbeta(1e6, 10, 2))
> hist(psamp2)

p-sample-prior-distro-inform

maxbiostat commented 4 years ago

Thanks, @bob-carpenter , that makes a lot of sense. I wonder if it also makes sense to simulate from "the prior" by just omitting the likelihood on the tests, i.e., including the sens/spec data:

data {
  int<lower = 0> y_sample;
  int<lower = 0> n_sample;

  int<lower = 0> y_spec;
  int<lower = 0> n_spec;

  int<lower = 0> y_sens;
  int<lower = 0> n_sens;

}
parameters {
  real<lower=0, upper = 1> p;
  real<lower=0, upper = 1> spec;
  real<lower=0, upper = 1> sens;
}
model {
  real p_sample = p * sens + (1 - p) * (1 - spec);
  // y_sample ~ binomial(n_sample, p_sample);
  y_spec ~ binomial(n_spec, spec);
  y_sens ~ binomial(n_sens, sens);
}

This can be simulated using pure MC as you did above, because the posterior for sens and spec is a beta. Prior

Posterior

maxbiostat commented 4 years ago

Also, should we go a step further and simulate fake data from these as well? That'd give how many positive tests we'd expect under the prior.

bob-carpenter commented 4 years ago

I wonder if it also makes sense to simulate from "the prior" by just omitting the likelihood on the tests, i.e., including the sens/spec data:

Yes, and you can do it without changing the program by simply defining the data y to be size zero.

should we go a step further and simulate fake data from these as well? That'd give how many positive tests we'd expect under the prior.

For an expectation, it's better to compute p_sample * N. But if you want uncertainty, then you need to look at binomial(N, p_sample). The former is the Rao-Blackwellized form of the latter. I try to explain this in the new user's guide chapters on predictive inference.

Simulating from the data is great in that it supports prior predictive checks where you plot a histogram of simulated values the data and compare summary statistics with the actual data. But with binary data like y here, that's not so easy. You can plot things like mean(y) vs. simulated mean(y) but sd doesn't add much because the sample sd is proportional to mean * (1 - mean).

It's even possible to compute prior predictive p-values, but there's not much point to that because we know that the prior isn't a good model in and of itself, because if it were, we wouldn't need to fit the model to data. But it can illustrate prior values that are inconsistent with common sense. (I added descriptions of how to do simulation-based calibration, prior and posterior predictive checks, and cross-validation in the Stan user's guide last release.).

maxbiostat commented 4 years ago

Those are great points, but as the data in this example are aggregated anyways, some of these difficulties with binary measurements are sort of swept under the rug, no? For instance one can simulate the prior and posterior predictive of the number of positive tests and those are going to be regular histograms, right? Fair point about Rao-Blackwellised vs non-BC estimates.

I get the general point about the lack of proper predictive summaries that can be meaningfully computed. It's a bit like fitting a Poisson and looking at mean(y) and var(y): technically different things, but actually they're identical histograms and knowing one gives you the other.

bob-carpenter commented 4 years ago

For instance one can simulate the prior and posterior predictive of the number of positive tests and those are going to be regular histograms, right.

Yes, good point.

We actually do want to test vs. something like Poisson. If we have overdispersed data and model as Poisson, then it'll fail posterior predictive tests for variance even though it might pass for mean. We usually don't need to check parameters that are directly represented in the model as they'll get fit. But with Poisson, you get one parameter to fit two things, so both can be off if you give it something like negative binomial data.

maxbiostat commented 4 years ago

We actually do want to test vs. something like Poisson.

Yes we do. Let y be observed data and let y_rep be replicates from the prior/posterior predictive. What I meant was this: if you fit the Poisson model to a set of data, you can compute the predictive of mean(y_rep) and var(y_rep) but you know upfront that these histograms will be the same up to Monte Carlo error. Of course, there's no reason on Earth mean(y)and var(y) would coincide. Sorry for the confusion.

bob-carpenter commented 4 years ago

[Edit: see below; I think this is answering the wrong question]

If you fit the Poisson model to a set of data, you can compute the predictive of mean(y_rep) and var(y_rep) but you know upfront that these histograms will be the same up to Monte Carlo error.

Not quite. Let's say the model has a Poisson likelihood p(y | lambda) and gamma prior p(lambda) so we can work out the result analytically. In general, the posterior predictive distribution is just the conditional expectation of the likelihood of new data,

p(y' | y)
    = E[p(y' | lambda) | y]
    = INTEGRAL p(y' | lambda) p(lambda | y) d.lambda
    = LIM{M -> infty} SUM{m in 1:M} p(y' | lambda(m)) where lambda(m) ~ p(lambda | y)

Because gamma is conjugate to the Poisson, we know the posterior p(lambda | y) is a also a gamma distribution. That means the posterior predictive distribution is a continuous gamma-mixture of Poissons, as indicated in the integral above. A gamma mixture of Poissons is just a negative binomial, and hence the parametric form of p(y' | y) is negative binomial, not Poisson.

If we just plugged in the expected value of lambda given the observed data and used p(y' | E[lambda | y]), that would be a Poisson distribution. But it's not the posterior predictive distribution and will not be calibrated (it's underdispersned). In the limit of infinite data, the negative binomial posterior predictive distribution converges to a Poisson.

bob-carpenter commented 4 years ago

Oh, I think what I said was addressing a different question about the posterior predictive. If you just do prior predictive checks or posterior predictive checks, each data set generated will be Poisson. It's the average over those that gives you inference, but if we compute means and variances of each, they should be similar and converge to being the same with more data.

maxbiostat commented 4 years ago

Oh, I think what I said was addressing a different question about the posterior predictive.

Yup. A case of talking past each other, I guess.

In any case, I shall try to submit a small report detailing some other prior (and posterior) sensitivity analyses in a few days. One thing to maybe ponder about is the induced prior distribution on the prevalence (p in the code, $\pi$ in the paper) and average prevalence (p_avg ) under the MRP model:

The posterior concentrates round the right value, but the prior still looks weird, if you'd agree:

I'm presenting the graphs this way because it's hard to see the "true" shape of the prior (blue) when it's plotted alongside the posterior (red).

bob-carpenter commented 4 years ago

I agree that a U-shaped prior on prevalence doesn't make sense. But it's flat enough I doubt it has a substantial impact on the posterior.

bob-carpenter / diagnostic-testing

Implicit uniform priors #4