drizopoulos / JMbayes2

Extended Joint Models for Longitudinal and Survival Data
https://drizopoulos.github.io/JMbayes2/
79 stars 23 forks source link

Numerical differentiation #9

Closed boennecd closed 2 years ago

boennecd commented 2 years ago

The numerical differentiation here https://github.com/drizopoulos/JMbayes2/blob/5859d7e9a3ad68bd14d283f171146782d7be8dcf/R/help_functions.R#L990-L1007

may give the wrong result when the point is close to zero. This is illustrated below:

test_func <- function(f, x, direction, eps = .001){
  if(direction == "both"){
    x1 <- x + eps
    delta <- 2 * eps
  } else if (direction == "backwards"){
    x1 <- x
    delta <- eps
  }

  x2 <- pmax(x - eps, 0)
  (f(x1) - f(x2)) / delta
}

f <- function(x) x^2 + x

test_func(f, 1:2, direction = "both") # c(2 * 1 + 1, 2 * 2 + 1)
test_func(f, 1:2, direction = "backwards") # ~ correct

test_func(f, 0, direction = "both") # wrong
test_func(f, 0, direction = "backwards") # wrong
test_func(f, 1e-5, direction = "both") # wrong
test_func(f, 1e-5, direction = "backwards") # wrong

The close to zero evaluation may happen if the time scale is small and there is no left-truncation. A fix can be something like:

fix_func <- function(f, x, direction, eps = .001){
  if(direction == "both"){
    x1 <- x + eps
    delta <- ifelse(x <= eps, eps, 2 * eps)
  } else if (direction == "backwards"){
    if(any(x <= eps))
      stop("'backwards' with times less than eps")
    x1 <- x
    delta <- eps
  }

  x2 <- ifelse(x <= eps, x, x - eps) # safe
  (f(x1) - f(x2)) / delta
}

fix_func(f, 1:2, direction = "both") # c(2 * 1 + 1, 2 * 2 + 1)
fix_func(f, 1:2, direction = "backwards") # ~ correct

fix_func(f, 0, direction = "both") # ~ correct
fix_func(f, 0, direction = "backwards") # error
fix_func(f, 1e-5, direction = "both") # ~ correct
fix_func(f, 1e-5, direction = "backwards") # error

The error can be replaced by forward numerical differentiation. An example where this may be an issue is given below. Perhaps it is something else (e.g. the default prior?) in which case I apologize!

library(JMbayes2)
sum(pbc2$year == 0) # quite a few zeros
#R> [1] 312
max(tapply(pbc2$year, pbc2$id, min)) # everybody has a time zero measurement
#R> [1] 0

# only keep three measurements per individual
pbc2 <- pbc2[order(pbc2$id, pbc2$year), ]
set.seed(1)
to_keep <- tapply(1:NROW(pbc2), pbc2$id, function(indices)
  if(length(indices) < 4) indices else
    c(indices[1], sample(indices, 2L)))
to_keep <- unlist(to_keep)
pbc2 <- pbc2[to_keep, ]

# fit on the original data
fm_org <- lme(fixed = log(serBilir) ~ year * sex, random = ~ year | id,
              data = pbc2)

fCox_org <- coxph(Surv(years, status2) ~ drug + age, data = pbc2.id)

fForms <- list("log(serBilir)" = ~ slope(log(serBilir)))

fit_org <- jm(fCox_org, fm_org, time_var = "year",
              functional_forms = fForms, n_chains = 1L,
              n_iter = 26000L, n_burnin = 1000L)

# squeeze the time scale
squeeze_fac <- max(pbc2$year) * 2
pbc2_squeeze <- transform(pbc2, year = year / squeeze_fac)
pbc2.id_squeeze <- transform(pbc2.id, years = years / squeeze_fac)

fm_squeeze <- lme(fixed = log(serBilir) ~ year * sex, random = ~ year | id,
                  data = pbc2_squeeze)

fCox_squeeze <- coxph(Surv(years, status2) ~ drug + age, data = pbc2.id_squeeze)

fit_squeeze <- jm(fCox_squeeze, fm_squeeze, time_var = "year",
                  functional_forms = fForms, n_chains = 1L,
                  n_iter = 26000L, n_burnin = 1000L)

# not that similar
summary(fit_org)$Survival
#R>                             Mean      StDev        2.5%       97.5%       P
#R> drugD-penicil        -0.05857478 0.32620558 -0.68630913  0.59358824 0.84824
#R> age                   0.06668607 0.01584273  0.03640867  0.09912225 0.00008
#R> slope(log(serBilir)) 11.24150868 1.56719955  8.37510852 14.42183870 0.00000
summary(fit_squeeze)$Survival
#R>                             Mean      StDev       2.5%     97.5%       P
#R> drugD-penicil        -0.06562505 0.33071585 -0.6943179 0.5990626 0.82672
#R> age                   0.06890149 0.01584006  0.0385639 0.1012742 0.00000
#R> slope(log(serBilir))  0.47809285 0.06831340  0.3465517 0.6176610 0.00000

# slope should have changed with a factor 1 / squeeze_fac
summary(fit_org)$Survival["slope(log(serBilir))", ] / squeeze_fac
#R>                           Mean      StDev      2.5%     97.5% P
#R> slope(log(serBilir)) 0.4482367 0.06248952 0.3339437 0.5750472 0
summary(fit_squeeze)$Survival["slope(log(serBilir))", ]
#R>                           Mean     StDev      2.5%    97.5% P
#R> slope(log(serBilir)) 0.4780929 0.0683134 0.3465517 0.617661 0

# the output from lme seems correct
rbind(fm_org$coefficients$fixed * c(1, squeeze_fac, 1, squeeze_fac),
      fm_squeeze$coefficients$fixed)
#R>      (Intercept)     year  sexfemale year:sexfemale
#R> [1,]   0.8088588 5.862404 -0.2818363      -2.880438
#R> [2,]   0.8088588 5.862406 -0.2818362      -2.880439
drizopoulos commented 2 years ago

Thanks for reporting. It should be resolved in the current commit.

boennecd commented 2 years ago

The new version does not fix the issue. The issue is the denominator in the finite difference here https://github.com/drizopoulos/JMbayes2/blob/3f8cab020642b177d8daafc7ac55b25a73045202/R/help_functions.R#L990-L1009

The sqrt(.Machine$double.eps) actually makes things a bit worse as illustrated below

test_func2 <- function(f, x, direction, eps = .001){
  if(direction == "both"){
    x1 <- x + eps
    delta <- 2 * eps
  } else if (direction == "backwards"){
    x1 <- x
    delta <- eps
  }

  x2 <- pmax(x - eps, sqrt(.Machine$double.eps))
  (f(x1) - f(x2)) / delta
}

eps <- sqrt(.Machine$double.eps)
test_func (f, 1:2, direction = "both", eps = eps) # c(2 * 1 + 1, 2 * 2 + 1)
test_func2(f, 1:2, direction = "both", eps = eps) # correct
test_func (f, 1:2, direction = "backwards", eps = eps) # correct
test_func2(f, 1:2, direction = "backwards", eps = eps)  # correct

test_func (f, 0, direction = "both", eps = eps) # wrong
test_func2(f, 0, direction = "both", eps = eps) # wrong
test_func (f, 0, direction = "backwards", eps = eps) # wrong
test_func2(f, 0, direction = "backwards", eps = eps) # wrong; now with wrong sign
drizopoulos commented 2 years ago

You are correct. I forgot to change the definition of the delta in the code. I.e.,

test_func <- function (f, x, direction, eps = .001) {
    x2 <- pmax(x - eps, sqrt(.Machine$double.eps))
    if (direction == "both") {
        x1 <- x + eps
        delta <- x1 - x2
    } else if (direction == "backwards"){
        x1 <- x
        delta <- x - x2
    }
    (f(x1) - f(x2)) / delta
}

f <- function (x) x^2 + x
g <- function (x) 2 * x + 1

test_func(f, 1:2, direction = "both")
test_func(f, 1:2, direction = "backwards") 
g(1:2)

test_func(f, 0, direction = "both")
test_func(f, 0, direction = "backwards")
g(0)

test_func(f, 1e-5, direction = "both") 
test_func(f, 1e-5, direction = "backwards") 
g(1e-5)
boennecd commented 2 years ago

This is better! There are still some possible issues:

test_func <- function (f, x, direction, eps = .001) {
  x2 <- pmax(x - eps, sqrt(.Machine$double.eps))
  if (direction == "both") {
    x1 <- x + eps
    delta <- x1 - x2
  } else if (direction == "backwards"){
    x1 <- x
    delta <- x - x2
  }
  (f(x1) - f(x2)) / delta
}

f <- function (x) x^2 + x
g <- function (x) 2 * x + 1

# someone might actually do this
test_func(f, 0, direction = "both", eps = sqrt(.Machine$double.eps))
#R> [1] NaN

maybe_fix <- function (f, x, direction, eps = .001) {
  x2 <- pmax(x - eps, 0) # changed to zero
  if (direction == "both") {
    x1 <- x + eps
    delta <- x1 - x2
  } else if (direction == "backwards"){
    x1 <- x
    delta <- x - x2
  }
  (f(x1) - f(x2)) / delta
}

maybe_fix(f, 0, direction = "both", eps = sqrt(.Machine$double.eps)) # works
#R> [1] 1
maybe_fix(f, 0, direction = "backwards", eps = sqrt(.Machine$double.eps)) # fails
#R> [1] NaN

maybe_fix <- function (f, x, direction, eps = .001) {
  x2 <- ifelse(x > eps, x - eps, x) # conditional change
  if (direction == "both") {
    x1 <- x + eps
    delta <- x1 - x2
  } else if (direction == "backwards"){
    x1 <- ifelse(x > eps, x, x + eps) # switch to forward
    delta <- x1 - x2 # notice x1!
  }
  (f(x1) - f(x2)) / delta
}

maybe_fix(f, 0, direction = "both", eps = sqrt(.Machine$double.eps)) # works
#R> [1] 1
maybe_fix(f, 0, direction = "backwards", eps = sqrt(.Machine$double.eps)) # works
#R> [1] 1
drizopoulos commented 2 years ago

Yes, but I won't expect someone wanting to set it to that small value. Either the user will leave it at the default value or set eps = 1 to get the functional form mentioned in the corresponding vignette.

boennecd commented 2 years ago

I am very sorry to write again but this did fix it https://github.com/drizopoulos/JMbayes2/blob/2dc434075ee197daf63aee3afd8d651fea30ecd6/R/help_functions.R#L963

but is there not still an issue here https://github.com/drizopoulos/JMbayes2/blob/2dc434075ee197daf63aee3afd8d651fea30ecd6/R/help_functions.R#L972

The e <- c(mapply("-", t1, t2)) assignment should be outside the if-else block.

drizopoulos commented 2 years ago

Yes, I know. But it doesn’t work for eps = 1 as I want. And because this is the intended use. I leave it as is for now.

—— Professor of Biostatistics Erasmus Medical Center Rotterdam The Netherlands


Από: boennecd @.> Στάλθηκε: Wednesday, December 1, 2021 7:18:14 AM Προς: drizopoulos/JMbayes2 @.> Κοιν.: Dimitris Rizopoulos @.>; State change @.> Θέμα: Re: [drizopoulos/JMbayes2] Numerical differentiation (Issue #9)

Waarschuwing: Deze e-mail is afkomstig van buiten de organisatie. Klik niet op links en open geen bijlagen, tenzij u de afzender herkent en weet dat de inhoud veilig is. Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

I am very sorry to write again but this did fix it https://github.com/drizopoulos/JMbayes2/blob/2dc434075ee197daf63aee3afd8d651fea30ecd6/R/help_functions.R#L963https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJMbayes2%2Fblob%2F2dc434075ee197daf63aee3afd8d651fea30ecd6%2FR%2Fhelp_functions.R%23L963&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Ce7bc3b87ba654b50b06a08d9b4925c8d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637739362980740303%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8E1B8a1ZHicVDmlKdsxxZDQIW0if%2FuKGI3TrUF%2BvV%2F8%3D&reserved=0

but is there not still an issue here https://github.com/drizopoulos/JMbayes2/blob/2dc434075ee197daf63aee3afd8d651fea30ecd6/R/help_functions.R#L972https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJMbayes2%2Fblob%2F2dc434075ee197daf63aee3afd8d651fea30ecd6%2FR%2Fhelp_functions.R%23L972&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Ce7bc3b87ba654b50b06a08d9b4925c8d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637739362980750257%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=H0WStQyjCWofGRuv%2BhccRLDz%2FKJ34w%2FD%2BwfmznhzCic%3D&reserved=0

The e <- c(mapply("-", t1, t2)) assignment should be outside the if-else block.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJMbayes2%2Fissues%2F9%23issuecomment-983325558&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Ce7bc3b87ba654b50b06a08d9b4925c8d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637739362980750257%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=flgwz1E3rE6Z%2FNKzhZhcmf%2BnszzaRqGtEr6HY9AtefU%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADE7TTY7VRIBA32CUBF6A6LUOW42NANCNFSM5I636IQA&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Ce7bc3b87ba654b50b06a08d9b4925c8d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637739362980760226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fVwhGs8YvB7gwd3226jOn2SqvDU4%2FOOUYiM4AM%2Fc73c%3D&reserved=0.

drizopoulos commented 2 years ago

The current version should work as expected.