Open alashworth opened 5 years ago
Comment by bob-carpenter Tuesday Apr 03, 2018 at 23:51 GMT
For optimizing, the Jacobian is turned off; it's on for sampling.
Thus your two programs will give the same answer for sampling, but different answers for optimization.
As far as solution tolerances, they're on the unconstrained scale. So that makes a difference for values near zero.
Comment by feuerbach Wednesday Apr 04, 2018 at 07:39 GMT
Thanks for the response Bob, that's very useful to know.
However, the two programs do give the same (incorrect) answer for optimization.
Also, I just checked that R's optim finds the correct answer both with and without the Jacobian.
Comment by bob-carpenter Wednesday Apr 04, 2018 at 18:50 GMT
Right. Didn't mean to indicate this explained what was going on. I just wanted to clarify what the answer is supposed to be.
Comment by bob-carpenter Wednesday Apr 04, 2018 at 19:57 GMT
I verified that it's indeed providing the wrong answer with:
data {
vector<lower=0>[2] X;
real<lower=0> sigma;
}
parameters {
real<lower=0> mu;
}
model {
X ~ normal(mu, sigma);
}
The output I get is this:
> mod <- stan_model("avg.stan")
> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=0)
Initial log joint probability = -3.7e+07
Optimization terminated normally:
Convergence detected: gradient norm is below tolerance
> mle$par
mu
0
This is a very very badly conditioned model for an init of 0 on the unconstrained scale, which is an init of 1 on the constrained scale. That puts it at (1 - .4) / 1e-4 = 6000 standard deviations away from the posterior mean. So I'm not sure if this is going to be fixable in the sense of being able to modify the algorithm to produce the right answer from such a bad init.
Returning the wrong answer is still a bug. I would be happy if we could at least diagnose the error.
If you get closer to the right answer on init, say just 3000 standard deviations away instead of 6000 standard deviations, then everything converges very reliably, as in:
> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=list(mu = 0.1))
> mle$par
mu
0.4
and same for
> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=list(mu = 0.7))
Comment by bob-carpenter Wednesday Apr 04, 2018 at 20:18 GMT
This works if you turn off all the tolerances other than the values themselves.
> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), tol_grad=-1, tol_rel_grad=-1, tol_obj=-1, tol_rel_obj=-1)
Initial log joint probability = -4.75677e+06
Exception: normal_lpdf: Location parameter is inf, but must be finite! (in 'model6ae246d29e15_avg' at line 9)
Exception: normal_lpdf: Location parameter is inf, but must be finite! (in 'model6ae246d29e15_avg' at line 9)
Exception: normal_lpdf: Location parameter is inf, but must be finite! (in 'model6ae246d29e15_avg' at line 9)
Exception: normal_lpdf: Location parameter is inf, but must be finite! (in 'model6ae246d29e15_avg' at line 9)
Error evaluating model log probability: Non-finite gradient.
Optimization terminated normally:
Convergence detected: absolute parameter change was below tolerance
> mle$par
mu
0.4
You can see it struggling with arithmetic along the way.
Setting these tolerances to 0 rather than -1, the algorithm detects convergence because you're so far out in the tails that the gradients must all be close to zero. I'm not quite sure why the releative change in objective isn't OK.
Comment by feuerbach Tuesday Apr 10, 2018 at 12:24 GMT
Unfortunately, I can't seem to reproduce your results.
If I don't supply init
as in your comment https://github.com/stan-dev/stan/issues/2511#issuecomment-378731571, then regardless of the tolerances it arrives at the right answer about half of the time, see https://ro-che.info/files/2018-04-03-stan-optimization-issue/2018-04-10-avg.html.
If I do supply init=0
, as in my original comment, then it never gets it right despite the turned off tolerances.
Issue by feuerbach Tuesday Apr 03, 2018 at 12:20 GMT Originally opened as https://github.com/stan-dev/stan/issues/2511
Consider the following simple model, which should return the average of two positive numbers:
R code:
If sigma is not too low (1 or 0.1), it works fine. It also works fine if I remove the lower bound on
mu
. But if the lower bound is present and sigma is low (1e-3, 1e-4),optimizing()
almost always returnsmu = 0
.The manually transformed model has the same behavior:
Yet the likelihood looks fine, and R's
optim()
has no problem maximizing it despite using the approximate gradient.rstan version 2.16.2