drizopoulos / JM

Joint Models for Longitudinal & Survival Data under Maximum Likelihood
34 stars 7 forks source link

rocJM producing unexpected effects #20

Closed DocEd closed 3 years ago

DocEd commented 4 years ago

Hi @drizopoulos,

I am fitting some joint models of organ dysfunction in critical illness. In order to help with stability, I center and scale variables before modelling (including the outcome of the longitudinal submodel). Some of the values of the longitudinal outcome will now take negative values. There is a strange appearance to the ROC plots (see reproducible example from the package sample data) where the curves don't complete their path. The AUC results are also much lower than would be expected. I can't seem to isolate the issue to any other cause. Is this normal behaviour? Must variables inside the joint model only take positive values? is it necessary that other variables in the model are also positive (e.g. explanatory variables inside the longitudinal or cox submodels)?

library(JM)

pbc2$status2 <- as.numeric(pbc2$status != "alive")
pbc2.id$status2 <- as.numeric(pbc2.id$status != "alive")

# center and scale longitudinal outcome
bil_scaling <- scale(pbc2$serBilir)

# reassign to data frame
pbc2$serBilir <- bil_scaling[, 1]
pbc2.id$serBilir <- pbc2$serBilir[pbc2$year == 0]

# fit models
lmeFit <- lme(serBilir ~ ns(year, 3),
              random = list(id = pdDiag(form = ~ ns(year, 3))), data = pbc2)
survFit <- coxph(Surv(years, status2) ~ drug, data = pbc2.id, x = TRUE)

jointFit <- jointModel(lmeFit, survFit, timeVar = "year", 
                       method = "piecewise-PH-aGH")

# ROC plots
ND <- pbc2[pbc2$id == "7", ]
roc1 <- rocJM(jointFit, dt = c(2, 4, 8), data = ND, idVar = "id")
plot(roc1)

Screenshot 2020-09-01 at 15 37 11

I couldn't find any other reference to this in your book or on stack overflow. Thank you in advance for your time and insights on this.

DocEd commented 4 years ago

I've done a little more investigation Prof @drizopoulos . I've found where the problem originates. When choosing the limits for cut off thresholds of the longitudinal biomarker the following lines of code in JM::rocJM do not anticipate a negatively valued biomarker:

if (is.null(cc) || !is.numeric(cc)) {
  pc <- quantile(object$y$y, c(0.05, 0.95), names = FALSE)
    if (is.null(min.cc) || !is.numeric(min.cc)) 
      min.cc <- min(min.cc, min(object$y$y) - pc[1]) # <- here is the problem
    if (is.null(max.cc) || !is.numeric(max.cc)) 
      max.cc <- max(max.cc, max(object$y$y) + pc[2])
  cc <- seq(min.cc, max.cc, length.out = 251)
}

Because the 0.05 quantile is also negative, it results in a lower limit for the threshold that is higher than the minimum value of the biomarker. This is easy to override with the min.cc argument, but is it worth changing the default behaviour to prevent this happening? I believe the following would work without causing any unintentional side effects:

min.cc <- min(min.cc, min(object$y$y) - abs(pc[1]))

I can see that pc is not used anywhere else. I'm happy to submit a pull request if you think this is appropriate.

This has also uncovered another interesting issue. The ROC curve appears to be sensitive to the scale of the time variable. When I scale time to be on [0, 1] to help with convergence, the ROC curve is flattened. See below

roc_problem

I have rescaled the time variable in this joint model (the same as the previous example) to be in years, months and on [0, 1]. Note how the curve is sensitive to the scaling. Though I cannot see a good explanation at the moment as to why this would be.

drizopoulos commented 4 years ago

Thanks for searching for this. Yes, please send the pull request, and I’ll study it. Thanks again.

From: Ed Palmer notifications@github.com Sent: Friday, September 4, 2020 10:52 AM To: drizopoulos/JM JM@noreply.github.com Cc: D. Rizopoulos d.rizopoulos@erasmusmc.nl; Mention mention@noreply.github.com Subject: Re: [drizopoulos/JM] rocJM producing unexpected effects (#20)

I've done a little more investigation Prof @drizopouloshttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C0fe3a622da5849b9004608d850afc251%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637348063381125459&sdata=yLJpHwilxUi0RyHcyxnfZsVMTkYDXjL3YSl8zmSE%2FGY%3D&reserved=0 . I've found where the problem originates. When choosing the limits for cut off thresholds of the longitudinal biomarker the following lines of code in JM::rocJM do not anticipate a negatively valued biomarker:

if (is.null(cc) || !is.numeric(cc)) {

pc <- quantile(object$y$y, c(0.05, 0.95), names = FALSE)

if (is.null(min.cc) || !is.numeric(min.cc))

  min.cc <- min(min.cc, min(object$y$y) - pc[1]) # <- here is the problem

if (is.null(max.cc) || !is.numeric(max.cc))

  max.cc <- max(max.cc, max(object$y$y) + pc[2])

cc <- seq(min.cc, max.cc, length.out = 251)

}

Because the 0.05 quantile is also negative, it results in a lower limit for the threshold that is higher than the minimum value of the biomarker. This is easy to override with the min.cc argument, but is it worth changing the default behaviour to prevent this happening? I believe the following would work without causing any unintentional side effects:

min.cc <- min(min.cc, min(object$y$y) - abs(pc[1]))

I can see that pc is not used anywhere else. I'm happy to submit a pull request if you think this is appropriate.

This has also uncovered another interesting issue. The ROC curve appears to be sensitive to the scale of the time variable. When I scale time to be on [0, 1] to help with convergence, the ROC curve is flattened. See below

[roc_problem]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F17110713%2F92220012-4268f780-ee93-11ea-914f-f4470e82f245.png&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C0fe3a622da5849b9004608d850afc251%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637348063381125459&sdata=Zqb3S6%2Fe5UftOmd9cu0SYhEg8aCFCMCrClg8zMQFGNU%3D&reserved=0

I have rescaled the time variable in this joint model (the same as the previous example) to be in years, months and on [0, 1]. Note how the curve is sensitive to the scaling. Though I cannot see a good explanation at the moment as to why this would be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrizopoulos%2FJM%2Fissues%2F20%23issuecomment-687017073&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C0fe3a622da5849b9004608d850afc251%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637348063381135451&sdata=NW0%2F5laBQFlbK32igv5n%2B3YDegrX1frpli7PpZVV3fo%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADE7TT7UHAJNFWQ5RXCZDJDSECTCHANCNFSM4QRZT4LA&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C0fe3a622da5849b9004608d850afc251%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637348063381135451&sdata=CR18PEemnfU4l0Yxa27fhhRUv9ximlSyGqgV%2FI2tHyk%3D&reserved=0.