Random Slope model issue with random effects parameter

gowerc commented 1 year ago

If you plot the real randoms slope of individuals against their predicted random slope from the model you can see that most patients follow a nice linear line with some noise. However there is this group of patients who appear to have some random dispersion in the middle which looks very much out of place. This did not exist on previous versions of the model (all points followed the nice linear line)

This would suggest there is some form of issue with the model

gowerc commented 1 year ago

Code to re-create plot just encase I accidentically lose it as I started exploring this in a branch for a different issue:

offsets <- tibble(
   arm = c("Group-1", "Group-2"),
   offset = c(1, 3)
)

est_ind_slope <- posterior::summarise_draws(mp@results, "mean") |>
    filter(stringr::str_detect(variable, "^lm_rs_ind_rnd_slope")) |>
    rename(est = mean) |>
    select(est)

slopes <- dat_lm |>
    group_by(pt) |>
    slice(1) |>
    ungroup() |>
    select(pt, arm, actual = slope_ind) |>
    left_join(offsets, by = "arm") |>
    mutate(actual_offset = actual - offset) |>
    bind_cols(est_ind_slope) |>
    mutate(est_offset = est - offset)

mean(slopes$actual - slopes$est)
sd(slopes$actual - slopes$est)

ggplot(slopes, aes(x = est_offset, y = actual_offset)) +
    geom_point() +
    geom_abline()

gowerc commented 1 year ago

Looks like it might be an issue with the simulation function. At least I am seeing the same issue when comparing to the random effects estimates generated by lme4:

library(lme4)
devtools::load_all()

## Generate Test data with known parameters
jlist <- simulate_joint_data(
    n = c(200, 200),
    times = 1:2000,
    lambda_cen = 1 / 9000,
    beta_cat = c(
        "A" = 0,
        "B" = -0.1,
        "C" = 0.5
    ),
    beta_cont = 0.3,
    lm_fun = sim_lm_random_slope(
        intercept = 30,
        sigma = 3,
        slope_mu = c(1, 3),
        slope_sigma = 0.2,
        phi = 0,
        .debug = TRUE
    ),
    os_fun = sim_os_exponential(
        lambda = 0.00333  # 1/300
    )
)

dat_lm <- jlist$lm |>
    dplyr::filter(time %in% c(1, 50, 100, 150, 200, 250, 300)) |>
    dplyr::arrange(time, pt)

mod <- lme4::lmer(
    sld ~ 1 + time:arm + (time -1 | pt),
    dat_lm
)

real_slope_mu <- c(
    "Group-1" = 1,
    "Group-2" = 3
)

pdat <- dat_lm |>
    filter(time == 1) |>
    select(pt, real = slope_ind, arm) |>
    mutate(slope_mu = real_slope_mu[as.numeric(arm)]) |>
    mutate(real_offset = real - slope_mu) |>
    mutate(obsv_lme4 = lme4::ranef(mod)$pt$time)

ggplot(pdat, aes(x = obsv_lme4, y = real_offset)) +
    geom_point() +
    geom_abline()

I tried re-working the simulation code to weed out the more complex bits but I'm still getting the same result 😢

danielinteractive commented 1 year ago

Thanks @gowerc, did you try to compare this group of patients with the rest of the patients and see if they differ in some clear way?

gowerc commented 1 year ago

Blah I'm an idiot.... I forgot that in the current main branch the simulation function automatically removes longitudinal observations post death. The group that are clustered on predicted_slope = 0 are subjects who only have 1 or 2 observations e.g. not enough to estimate the slope properly. Using all observations (including post death) everything seems fine:

So no issue here. That being said I will leave this ticket open in order to convert my current code snippets into a proper unit test for the LM model

Genentech / jmpost

Random Slope model issue with random effects parameter #186