MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #17703] constrOptim() fails for a well-specified problem on R-3.6.2 #6877

Open MichaelChirico opened 4 years ago

MichaelChirico commented 4 years ago

Created attachment 2530 [details] Near-minimal working example.

Function constrOptim() sometimes fails for a well-specified optimization. I include a near-minimal self-contained working example that exhibits the problem on R3.6.2, svn 77560, macosx 10.14.6 and also R-3.6.2 on Windows and linux. The problem does not occur for R-3.5.1 on windows. Distinct issue from #3325.

My optimization problem is well-defined and has its optimal point at a constraint corner; the gradient of the optimal function is high there.

The error message is difficult to activate and occurs only rarely: slight changes in the startp argument can either activate or suppress the error.

I tried to "capture" a value of startp that exhibits the error and use that but to my confusion this strategy does not appear to work: doing so results in perfectly working optimization [not strictly true; see notes below].

## R CODE STARTS [also see bug.R] set.seed(0)

for(i in 1:100){ print(i)
startp <- c(1/3,1/3) + rnorm(2)*1e-6 cat(paste("startp = c(",paste(startp,collapse=","),")\n"))

objective <- function(p){ out <- -log(p[1] + p[2]) + log(p[2]) + log(1 - p[1] - p[2]) ## following line for debugging [not needed to reproduce error] cat(paste("objective(c(",paste(p,collapse=","),") =",out,"\n")) return(out) }

gradfun <- function(p){ out <- c( -1/(p[1] + p[2]) - 1/(1 - p[1] - p[2]), -1/(p[1] + p[2]) + 1/p[2] - 1/(1 - p[1] - p[2]) ) ## following line for debugging [not needed to reproduce error] cat(paste("gradfun(c(",paste(p,collapse=","),") = c(",paste(out,collapse=","),")\n")) return(out) }

small <- 1e-6 CI <- c(small,small,small-1)

UI <- rbind( c( 1, 0), # p1>small c( 0, 1), # p2>small c(-1,-1) # p1+p2 < 1-small )

out <- constrOptim( theta = startp, f = objective, grad = gradfun, ui = UI, ci = CI )

print(i)
}

## R CODE ENDS

rstudio % R --vanilla

R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

R.Version()

$platform [1] "x86_64-apple-darwin15.6.0"

$arch [1] "x86_64"

$os [1] "darwin15.6.0"

$system [1] "x86_64, darwin15.6.0"

$status [1] ""

$major [1] "3"

$minor [1] "6.2"

$year [1] "2019"

$month [1] "12"

$day [1] "12"

$svn rev [1] "77560"

$language [1] "R"

$version.string [1] "R version 3.6.2 (2019-12-12)"

$nickname [1] "Dark and Stormy Night"

source("bug.R",echo=TRUE)
set.seed(0)
for(i in 1:100){

+ print(i)
+ startp <- c(1/3,1/3) + rnorm(2)*1e-6 + cat(paste("startp = c(",paste(startp,collapse=","),")\n")) + + + `ob .... [TRUNCATED] [1] 1 startp = c( 0.333334596287618,0.333333007099973 ) objective(c( 0.333334596287618,0.333333007099973 ) = -1.79176466317574 objective(c( 0.333334596287618,0.333333007099973 ) = -1.79176466317574 objective(c( 0.333334596287618,0.333333007099973 ) = -1.79176466317574 gradfun(c( 0.333334596287618,0.333333007099973 ) = c( -4.50000632289289,-1.50000338678977 ) objective(c( 0.513334849203334,0.393333142571563 ) = -3.20671155050333 gradfun(c( 0.513334849203334,0.393333142571563 ) = c( -11.817377397968,-9.27500328359227 )

[snip]

objective(c( 0.953770837623601,0.0462281623763981 ) = -16.8896756485489 gradfun(c( 0.953770837623601,0.0462281623763981 ) = c( -1000000.99922284,-999979.36738743 ) objective(c( 0.953770837623601,0.0462281623763983 ) = -16.889675648986 objective(c( 0.953770837623601,0.0462281623763983 ) = -16.889675648986 gradfun(c( 0.953770837623601,0.0462281623763983 ) = c( -1000000.99965999,-999979.367824581 ) objective(c( 0.953770837623602,0.0462281623763985 ) = -16.8896756494232 Error in optim(theta.old, fun, gradient, control = control, method = method, : initial value in 'vmmin' is not finite

i

[1] 1

q()

## R OUTPUT ENDS.

Notes:

  1. Note the "for(i in 1:1000){...}" loop. I originally thought that I could find a value of startp that triggers the error and use that, but this strategy does not trigger the error. The error occurs when I use the for() loop and (AFAICS) not otherwise. I know this is weird...

  2. The derivative of objective() is gradfun():

objective(p+delta) - objective(p)

[1] -0.0002005003

sum(gradfun(p+delta/2)*delta)

[1] -0.0002005002

  1. See how objective() returns a moderate value, and gradfun() a large but finite value.

METADATA

MichaelChirico commented 4 years ago

FWIW, inspecting the object 'gi' here within the call to R()

constrOptim <- function(theta, f, grad, ui, ci, mu = 0.0001, control = list(), method = if(is.null(grad)) "Nelder-Mead" else "BFGS", outer.iterations = 100, outer.eps = 0.00001, ..., hessian = FALSE) {

if (!is.null(control$fnscale) && control$fnscale < 0)
    mu <- -mu ##maximizing

R <- function(theta, theta.old, ...) {
    ui.theta <- ui%*%theta
    gi <-  ui.theta-ci
    if (any(gi<0)) return(NaN)

reveals that the third element of 'gi' is ever so slightly less than 0, resulting in the return value of NaN and hence the error condition.

Browse[1]> gi [,1] [1,] 9.537698e-01 [2,] 4.622716e-02 [3,] -5.551115e-16


METADATA

MichaelChirico commented 4 years ago

thanks for this Benjamin, it is a great relief to see that others can reproduce this! But my understanding is that variable gi can become negative in the usual course of evaluation of constrOptim(), this just flags that theta violates a constraint. Indeed if you print it out for my problem it attains a value of about -5.66 on the some iterations.

But, as far as I can tell from dput() statements, function objective() is never evaluted outside the constraints.... there must be some bad weirdness somewhere.

Best wishes, Robin


METADATA

MichaelChirico commented 4 years ago

Created attachment 2532 [details] patch for stats::constrOptim

Here is a patch for constrOptim which will warn if the augmented objective comes back as non-finite, and will also detect and warn if the par returned from optim is outside of the feasible region. While the first warning is perhaps too heavy-handed, the second warning is not, as on the next iteration through optim there will surely be an error.


METADATA

INCLUDED PATCH

MichaelChirico commented 4 years ago

(In reply to Robin Hankin from comment #2)

thanks for this Benjamin, it is a great relief to see that others can
reproduce this!  But my understanding is that variable `gi` can become
negative in the usual course of evaluation of constrOptim(), this just flags
that theta violates a constraint.   Indeed if you print it out for my
problem it attains a value of about -5.66 on the some iterations.

But, as far as I can tell from dput() statements, function objective() is
never evaluted outside the constraints.... there must be some bad weirdness
somewhere.

Best wishes, Robin

Under the patch I just uploaded, here is what the output would look like instead:

objective(c( 0.953770837623601,0.0462281623763983 ) = -16.889675648986 objective(c( 0.953770837623601,0.0462281623763983 ) = -16.889675648986 gradfun(c( 0.953770837623601,0.0462281623763983 ) = c( -1000000.99965999,-999979.367824581 ) objective(c( 0.953770837623602,0.0462281623763985 ) = -16.8896756494232 Error in optim(theta.old, fun, gradient, control = control, method = method, : initial value in 'vmmin' is not finite In addition: Warning messages: 1: In stats:::constrOptim(theta = startp, f = objective, grad = gradfun, : on iteration 3 a non-finite objective came back from optim 2: In stats:::constrOptim(theta = startp, f = objective, grad = gradfun, : on iteration 4 a non-finite objective came back from optim 3: In stats:::constrOptim(theta = startp, f = objective, grad = gradfun, : optim is going to fail on the next iteration! ----------- #### METADATA - Comment author - Benjamin Tyner - Timestamp - 2020-02-02 11:59:16 UTC
MichaelChirico commented 4 years ago

thanks for this, I have been studying the patch and my results match yours.

Rather than "optim is going to fail on the next iteration!" could the report be "optim() will be called outside the constraints specified by ui and ci"?

One reason I specified small=1e-6 was to define a "buffer" region so that the function would be defined on the constraint itself and indeed slightly beyond it. So what is baffling me is how the tiny exceedance of about 5e-16 that Benjamin reports can possibly lead to objective() being infinite.


METADATA

MichaelChirico commented 4 years ago

(In reply to Robin Hankin from comment #5)

thanks for this, I have been studying the patch and my results match yours.

Rather than "optim is going to fail on the next iteration!" could the report
be "optim() will be called outside the constraints specified by ui and ci"?

One reason I specified small=1e-6 was to define a "buffer" region so that
the function would be defined on the constraint itself and indeed slightly
beyond it.  So what is baffling me is how the tiny exceedance  of about
5e-16 that Benjamin reports can possibly lead to objective() being infinite.

It's because the augmented objective R defined inside constrOptim tries to compute log(gi), resulting in a non-finite 'bar' hence Inf is added to your objective.

R <- function (theta, theta.old, ...) { ui.theta <- ui %% theta gi <- ui.theta - ci if (any(gi < 0)) { return(NaN) } else { print("okay") } gi.old <- ui %% theta.old - ci bar <- sum(gi.old log(gi) - ui.theta) if (!is.finite(bar)) bar <- -Inf objective(theta, ...) - mu bar }


METADATA

MichaelChirico commented 4 years ago

I had not appreciated that NA and indeed NaN are not finite:

is.finite(NA)

[1] FALSE

is.finite(NaN)

[1] FALSE

I'm having difficulty following the intended structure of R(). If any(gi)<0 then function R() exits with NaN....so I can't see how control possibly reaches

bar <- sum(gi.old*log(gi)-ui.theta)

which (I think) invalidates my suggested fix

bar <- sum(gi.old*log(abs(gi))-ui.theta)


METADATA

MichaelChirico commented 4 years ago

(In reply to Robin Hankin from comment #7)

I'm having difficulty following the intended structure of R().  If any(gi)<0
then function R() exits with NaN....so I can't see how control possibly
reaches  

bar <- sum(gi.old*log(gi)-ui.theta)

To clarify: indeed it returns early with NaN, since at that point there's no need to actually perform additional arithmetic. (This also avoids the warning associated with taking log of a negative number). But regardless, the bottom line is, optim doesn't allow a non-finite objective on the initial evaluation.


METADATA

MichaelChirico commented 4 years ago

OK, so optim() can't deal with a initial point that has non-finite objective function. Here, R() returns NaN because theta is outside its constraints by about 1e-16. Is there an easy fix?


METADATA

MichaelChirico commented 4 years ago

Created attachment 2535 [details] direct demonstration of the optim condition (without using contrOptim)

The attached demonstrates a return value coming from optim which does not satisfy the constraints.


METADATA

INCLUDED PATCH

MichaelChirico commented 4 years ago

(In reply to Robin Hankin from comment #9)

OK, so optim() can't deal with a initial point that has non-finite objective
function.  Here, R() returns NaN because theta is outside its constraints by
about 1e-16.  Is there an easy fix?

See the attachment I uploaded with comment #9, which demonstrates a condition where optim's return is:

$par [1] 0.95377084 0.04622816

$value [1] NaN

$counts function gradient 32 1

$convergence [1] 0

$message NULL

...despite the initial fun(theta.old) being finite.

My sense is the next logical step would be to determine whether this constitutes a bug in optim itself.


METADATA

MichaelChirico commented 4 years ago

Sorry, I missed your attachment, which gives me results identical to yours.

Arguably, optim() is behaving as documented:

 Function ‘fn’ can return ‘NA’ or ‘Inf’ if the function cannot be
 evaluated at the supplied value, but the initial value must have a
 computable finite value of ‘fn’.  (Except for method ‘"L-BFGS-B"’
 where the values should always be finite.)

But this kind of implies that returning NA away from the initial value does not prejudice returning the initial value, and to be fair:

f <- function(x){if(all(x==0)){return(2)}else{return(NA)}}
optim(c(0,0),f)

$par [1] 0 0

$value [1] 2

$counts function gradient 501 NA

$convergence [1] 1

$message NULL

So at the very least optim() is behaving inconsistently in this regard, but whether it is desirable is unclear to me.


METADATA

MichaelChirico commented 4 years ago

(In reply to Robin Hankin from comment #12)

But this kind of implies that returning NA away from the initial value does
not prejudice returning the initial value, and to be fair: 

> f <- function(x){if(all(x==0)){return(2)}else{return(NA)}}
> optim(c(0,0),f)

Strictly speaking, for that to be even remotely comparable to the code path used in your original example, need to add method = "BFGS" and a gradient function e.g.

optim(c(0,0),f, gr = function(x) c(0,0), method="BFGS")

but regardless, I agree with your point. The underlying source code used is here:

https://svn.r-project.org/R/trunk/src/appl/optim.c


METADATA

MichaelChirico commented 4 years ago

This problem breaks at least two of the vignettes in the hyper2 package, which means that I cannot upload a much-needed update to CRAN. Can anyone advise?


METADATA

MichaelChirico commented 4 years ago

Not sure why I missed this but:

https://stat.ethz.ch/pipermail/r-help/2012-April/311073.html

documents the same behaviour back in 2012, using R-2.15.0


METADATA

MichaelChirico commented 4 years ago

Robin Hankin: When you say "breaks" the vignettes are you saying they formerly worked and stopped? Or that they don't work and you expected them to?


METADATA

MichaelChirico commented 4 years ago

(In reply to elin.waring from comment #16)

Robin Hankin: When you say "breaks" the vignettes are you saying they
formerly worked and stopped?  Or that they don't work and you expected them
to?

Hello elin. All the vignettes used to work (that is, they passed winbuilder's checks with no problem) and then the error started to occur.


METADATA

MichaelChirico commented 4 years ago

So just trying to trace when it stopped working in your vignettes, did the breakage happen after moving to R 3.6.2 from an earlier version?


METADATA

MichaelChirico commented 4 years ago

(In reply to elin.waring from comment #18)

So just trying to trace when it stopped working in your vignettes, did the
breakage happen after moving to R 3.6.2 from an earlier version?

I have had the problem intermittently, and apparently randomly, for a long time. I never got round to filing a proper report until now, as my optimizations worked almost every time, and I could not reproduce the error. Comment #12 would suggest that the problem is not new to R-3.6.2. Best wishes, Robin


METADATA