optimization fails for a constrained parameter

alashworth commented 5 years ago

Issue by feuerbach Tuesday Apr 03, 2018 at 12:20 GMT Originally opened as https://github.com/stan-dev/stan/issues/2511

Consider the following simple model, which should return the average of two positive numbers:

data {
  vector<lower=0>[2] X;
  real<lower=0> sigma;
}
parameters {
  real<lower=0> mu;
}
model {
  X ~ normal(mu, sigma);
}

R code:

avg_model <- stan_model("avg.stan")
optimizing(avg_model, list(X=c(0.3, 0.5), sigma=1e-4), init=0)$par

If sigma is not too low (1 or 0.1), it works fine. It also works fine if I remove the lower bound on mu. But if the lower bound is present and sigma is low (1e-3, 1e-4), optimizing() almost always returns mu = 0.

The manually transformed model has the same behavior:

data {
  vector<lower=0>[2] X;
  real<lower=0> sigma;
}
parameters {
  real log_mu;
}
transformed parameters {
  real mu = exp(log_mu);
}
model {
  X ~ normal(mu, sigma);
  target += log_mu;
}

Yet the likelihood looks fine, and R's optim() has no problem maximizing it despite using the approximate gradient.

liky <- Vectorize(function(log_mu) {
  X <- c(0.3,0.5)
  sum(dnorm(X, mean=exp(log_mu) , sd=1e-4, log=T)) + log_mu
})
curve(liky(x), from=-10, to=-0.2)
optim(0, liky, control = list(fnscale=-1))$par

liky

rstan version 2.16.2

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Apr 03, 2018 at 23:51 GMT

For optimizing, the Jacobian is turned off; it's on for sampling.

Thus your two programs will give the same answer for sampling, but different answers for optimization.

As far as solution tolerances, they're on the unconstrained scale. So that makes a difference for values near zero.

alashworth commented 5 years ago

Comment by feuerbach Wednesday Apr 04, 2018 at 07:39 GMT

Thanks for the response Bob, that's very useful to know.

However, the two programs do give the same (incorrect) answer for optimization.

Also, I just checked that R's optim finds the correct answer both with and without the Jacobian.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Apr 04, 2018 at 18:50 GMT

Right. Didn't mean to indicate this explained what was going on. I just wanted to clarify what the answer is supposed to be.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Apr 04, 2018 at 19:57 GMT

I verified that it's indeed providing the wrong answer with:

data {
  vector<lower=0>[2] X;
  real<lower=0> sigma;
}
parameters {
  real<lower=0> mu;
}
model {
  X ~ normal(mu, sigma);
}

The output I get is this:

> mod <- stan_model("avg.stan")
> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=0)
Initial log joint probability = -3.7e+07
Optimization terminated normally: 
  Convergence detected: gradient norm is below tolerance
> mle$par
mu 
 0

This is a very very badly conditioned model for an init of 0 on the unconstrained scale, which is an init of 1 on the constrained scale. That puts it at (1 - .4) / 1e-4 = 6000 standard deviations away from the posterior mean. So I'm not sure if this is going to be fixable in the sense of being able to modify the algorithm to produce the right answer from such a bad init.

Returning the wrong answer is still a bug. I would be happy if we could at least diagnose the error.

If you get closer to the right answer on init, say just 3000 standard deviations away instead of 6000 standard deviations, then everything converges very reliably, as in:

> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=list(mu = 0.1))

> mle$par
 mu 
0.4

and same for

> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=list(mu = 0.7))

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Apr 04, 2018 at 20:18 GMT

This works if you turn off all the tolerances other than the values themselves.

> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), tol_grad=-1, tol_rel_grad=-1, tol_obj=-1, tol_rel_obj=-1)
Initial log joint probability = -4.75677e+06
Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Error evaluating model log probability: Non-finite gradient.

Optimization terminated normally: 
  Convergence detected: absolute parameter change was below tolerance

> mle$par
 mu 
0.4

You can see it struggling with arithmetic along the way.

Setting these tolerances to 0 rather than -1, the algorithm detects convergence because you're so far out in the tails that the gradients must all be close to zero. I'm not quite sure why the releative change in objective isn't OK.

alashworth commented 5 years ago

Comment by feuerbach Tuesday Apr 10, 2018 at 12:24 GMT

Unfortunately, I can't seem to reproduce your results.

If I don't supply init as in your comment https://github.com/stan-dev/stan/issues/2511#issuecomment-378731571, then regardless of the tolerances it arrives at the right answer about half of the time, see https://ro-che.info/files/2018-04-03-stan-optimization-issue/2018-04-10-avg.html.

If I do supply init=0, as in my original comment, then it never gets it right despite the turned off tolerances.

alashworth / test-issue-import

optimization fails for a constrained parameter #190