dymium-org / example

0 stars 0 forks source link

A microsimulation model for projecting disability status #1

Open jfreedman-svg opened 4 years ago

jfreedman-svg commented 4 years ago
  1. Using a cleaned person-period file, estimate transition probabilities for each time-varying covariate of interest and the 4 states.

    1. The 4-state model: healthy [1], mild disability [2], moderate disability [3], and death [4]. The only reverse transition allowed is [2] -> [1].
    2. This can be modelled using logit or probit. All evolving covariates are binary and can be modelled using logit or survival models.
      1. The ordering of the transition models is as the following
        1. DisabilityState = Age + Sex + Education + v1 + v2 + v3 + v4
        2. v4 = Age + Sex + Education + v1 + v2 + v3
        3. v3 = Age + Sex + Education + v1 + v2
        4. v2 = Age + Sex + Education + v1
        5. v1 = Age + Sex + Education
      2. [Q1]: How to link the 4-state transition model with the other transition models?
  2. Transition matrices are calculated and stored. Now, generate a starting population of size N using a bootstrapped approach.

  3. We have our transition matrices and our starting population. Now, we simulate. *In the complex case, this will be done for each of the time-varying covariates and the ultimately the 4-state model.

  4. This process repeats itself for 15 simulation cycles (years)

  5. Output: Aggregated counts in each state. Dis-aggregated counts by covariates.

asiripanich commented 4 years ago

I made some edits so that your question is more readable.

asiripanich commented 4 years ago

So this is quite long but I'm going to walk you through the data and model preparation step to running your microsimulation model. Please also refer to the documentation at https://core.dymium.org/index.html for parts that are not clear. If they are still unclear then let me know I will try to explain the best I can. :)

For the record, we also setup a repo at https://github.com/asiripanich/microsim-disability for your project which has a minimal working example of your model.

1. Import/create an agent population data

To link those models together you just need to make sure that you have a population data with those attributes, all the eight covariates.

library(data.table)
n_agents <- 100
population_data <- 
  data.table::data.table(
    id = 1:n_agents,
    age = sample(c(0:100), n_agents, replace = TRUE),
    sex = sample(c('male', 'female'), n_agents, replace = TRUE),
    education = sample(c('high school and below', 'diploma', 'university level'), n_agents, replace = TRUE),
    v1 = sample(c(0:1), n_agents, replace = TRUE),
    v2 = sample(c(0:1), n_agents, replace = TRUE),
    v3 = sample(c(0:1), n_agents, replace = TRUE),
    v4 = sample(c(0:1), n_agents, replace = TRUE),
    disability_status = sample(c('state_1', 'state_2', 'state_3', 'state_4'), n_agents, replace = TRUE)
  )

2. Estimate/import transition models

If you haven't estimate your transition models then you can use the caret package to do that. In your case, you can use the glm function from the stats package to estimate logistic regression models and the nnet package to estimate a multinomial logit model. dymiumCore can use these models estimated using the caret package for microsimulation without you having to extract the parameters. This is very convenient if you often need to re-estimate your model.

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2

logit_model <-
  train(
    v1 ~ age + sex,
    data = population_data,
    method = "glm",
    family = binomial(link = "logit")
  )

summary(logit_model)
#> 
#> Call:
#> NULL
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.3466  -1.1190  -0.9599   1.1933   1.4212  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)
#> (Intercept)  0.040547   0.476980   0.085    0.932
#> age         -0.006034   0.007009  -0.861    0.389
#> sexmale      0.366853   0.406192   0.903    0.366
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 138.47  on 99  degrees of freedom
#> Residual deviance: 136.70  on 97  degrees of freedom
#> AIC: 142.7
#> 
#> Number of Fisher Scoring iterations: 4

Created on 2020-03-14 by the reprex package (v0.3.0)

3. Setup your world

Now that you have your population data, models, and event functions, you want to create a World object to store your population data. World is responsible for storing data and model objects that are required for your microsimulation model.

# create a World object
MyWorld <- World$new()

# create an Agent object
MyAgent <- Agent$new(.data = population_data, id_col = "id")

# add the Agent object to the world object.
MyWorld $add(MyAgent)

# As you can see the world object only contain 1 item which is the Agent object that we have just added
world
#> There are 1 items in self$Cont.
#> Class: Agent
#>  Inheritance: Agent <- Entity <- Generic <- R6
#>  Number_of_entities: 100
#>  Number_of_removed_entities: 0
#>  Data[rows, cols]: attrs[100, 9]

# to access the Agent object we can use the `entities` field which is a named list 
# Note that, all Entity inheritance classes, including Agent, can be refer to by its class name.
world$entities$Agent
#> Class: Agent
#>  Inheritance: Agent <- Entity <- Generic <- R6
#>  Number_of_entities: 100
#>  Number_of_removed_entities: 0

# or alternatively you can use the `get` method and refer to the Agent object by its name which is 'Agent'
MyWorld$get("Agent")
#> Class: Agent
#>  Inheritance: Agent <- Entity <- Generic <- R6
#>  Number_of_entities: 100
#>  Number_of_removed_entities: 0
#>  Data[rows, cols]: attrs[100, 9]

# the `get` function returns a reference to the called object within World
Agt <- MyWorld$get("Agent")

# to view Agent's attribute data use the `get_data` method
Agt$get_data()
#>  Data[rows, cols]: attrs[100, 9]
#>       id age    sex             education  v1  v2  v3  v4 disability_status
#>   1:   1  94 female high school and below yes yes  no  no           state_1
#>   2:   2  78   male      university level  no yes  no yes           state_1
#>   3:   3   6 female               diploma  no  no yes yes           state_1
#>   4:   4  32 female high school and below yes yes  no yes           state_2
#>   5:   5  79 female high school and below  no yes  no yes           state_4
#>   6:   6  83 female      university level yes  no yes  no           state_2

Notice that, Agt and MyAgent are pointing to the same population data. You can try use get_data() on both variables and they will return the exact same data.

Created on 2020-03-14 by the reprex package (v0.3.0)

4. Create transition event functions

Now to simulate transitions between different states, it is recommended that you create a function for each transition like below. An event function should except at least two inputs. The first input is a World object and the second input is a model object that you want to use to predict transition probabilities.

event_disability <- function(world, model) {

  # the `get` function returns a reference to the called object stored in World
  Agt <- world$get("Agent")

  # select only agents that are older than 50 years old and not in disability state 4
  eligible_agent_ids <- Agt$get_data()[age > 50 & state != "state_4",
                                       get(Agt$get_id_col())]

  TransitionDisability <- 
    TransitionClassification$new(
      x = Agt, 
      model = model, 
      targeted_agents = eligible_agent_ids
    )

  # use the simulated result to update the 'state' attribute of the selected agents
  TransitionDisability$update_agents(attr = "disability_status")

  Agt$log(desc = "disability_status:state",
          value = xtabs(~ state, data = Agt$get_data()))

  return(world)
}

event_v1 <- function(world, model) {...}

event_v2 <- function(world, model) {...}

event_v3 <- function(world, model) {...}

event_v4 <- function(world, model) {...}

5 Create a Microsimulation pipeline

You can repeat the whole process for several times to obtain the simulation results from multiple independent runs to create a confidence interval.

for (i in 1:15) {
  MyWorld$start_iter(i, unit = "year") %>%
    event_v1(., model = v1_model) %>%
    event_v2(., model = v2_model) %>%
    event_v3(., model = v3_model) %>%
    event_v4(., model = v4_model) %>%
    event_disability(., disability_model)
}
asiripanich commented 4 years ago

Here is a fixed version to the toy example you shared with me. I removed the adding new entrants script that you put inside the simulation pipeline. You should instead create that as an event like how we did for the other events.

library(data.table)
library(dymiumCore)
#> ── * dymium's options * ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ● dymiun.scenario_dir: /var/folders/0d/9srpj_750lxbkfs2_8nwkcpw0000gn/T//Rtmp2iIk31/scenario
#> ● dymiun.input_dir: /var/folders/0d/9srpj_750lxbkfs2_8nwkcpw0000gn/T//Rtmp2iIk31/scenario/inputs
#> ● dymiun.output_dir: /var/folders/0d/9srpj_750lxbkfs2_8nwkcpw0000gn/T//Rtmp2iIk31/scenario/outputs
library(testthat)
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(ggplot2)
set.seed(1)

# agent data --------------------------------------------------------------
n_agents <- 1000
population_data <- 
  data.table::data.table(
    id = 1:n_agents,
    age = sample(c(0:100), n_agents, replace = TRUE),
    sex = sample(c('male', 'female'), n_agents, replace = TRUE),
    education = sample(c('high school and below', 'diploma', 'university level'), n_agents, replace = TRUE),
    v1 = sample(c('no', 'yes'), n_agents, replace = TRUE),
    v2 = sample(c('no', 'yes'), n_agents, replace = TRUE),
    v3 = sample(c('no', 'yes'), n_agents, replace = TRUE),
    v4 = sample(c('no', 'yes'), n_agents, replace = TRUE),
    disability_status = sample(c('state_1', 'state_2', 'state_3', 'state_4'), n_agents, replace = TRUE)
  )

# models ------------------------------------------------------------------
v1_model <-
  train(
    v1 ~ age + sex,
    data = population_data,
    method = "glm",
    family = binomial(link = "logit")
  )

v2_model <-
  train(
    v2 ~ age + sex + v1,
    data = population_data,
    method = "glm",
    family = binomial(link = "logit")
  )

v3_model <-
  train(
    v3 ~ age + sex + v1 + v2,
    data = population_data,
    method = "glm",
    family = binomial(link = "logit")
  )

v4_model <-
  train(
    v4 ~ age + sex + v1 + v2 + v3,
    data = population_data,
    method = "glm",
    family = binomial(link = "logit")
  )

disability_model <-
  train(
    disability_status ~ age + sex + v1 + v2 + v3 + v4,
    data = population_data,
    method = "multinom",
    family = binomial(link = "logit")
  )
#> # weights:  32 (21 variable)
#> initial  value 1386.294361 
#> iter  10 value 1377.042417
#> iter  20 value 1375.550262
#> final  value 1375.531152 
#> ...
#> # weights:  32 (21 variable)
#> initial  value 1386.294361 
#> iter  10 value 1377.562496
#> iter  20 value 1377.042549
#> final  value 1377.029763 
#> converged

# create world ------------------------------------------------------------
# create a World object
MyWorld <- World$new()

# create an Agent object
MyAgent <- Agent$new(.data = population_data, id_col = "id")

# add the Agent object to the world object.
MyWorld$add(MyAgent)

# events ------------------------------------------------------------------
event_disability <- function(world, model) {

  # the `get` function returns a reference to the called object stored in World
  Agt <- world$get("Agent")

  # select only agents that are older than 50 years old and not in disability state 4
  eligible_agent_ids <- Agt$get_data()[age > 50 & disability_status != "state_4",
                                       get(Agt$get_id_col())]

  TransitionDisability <- 
    TransitionClassification$new(
      x = Agt, 
      model = disability_model, 
      targeted_agents = eligible_agent_ids
    )

  # use the simulated result to update the 'state' attribute of the selected agents
  TransitionDisability$update_agents(attr = "disability_status")

  Agt$log(desc = "disability_status:state",
          value = xtabs(~ disability_status, data = Agt$get_data()))

  return(world)
}

event_v1 <- function(world, model) {

  # the `get` function returns a reference to the called object stored in World
  Agt <- world$get("Agent")

  # select only agents that are older than 50 years old
  eligible_agent_ids <- Agt$get_data()[age > 50,
                                       get(Agt$get_id_col())]

  TransitionV1 <- 
    TransitionClassification$new(
      x = Agt, 
      model = v1_model, 
      targeted_agents = eligible_agent_ids
    )

  # use the simulated result to update the 'state' attribute of the selected agents
  TransitionV1$update_agents(attr = "v1")

  Agt$log(desc = "v1",
          value = xtabs(~ v1, data = Agt$get_data()))

  return(world)
}

event_v2 <- function(world, model) {

  # the `get` function returns a reference to the called object stored in World
  Agt <- world$get("Agent")

  # select only agents that are older than 50 years old
  eligible_agent_ids <- Agt$get_data()[age > 50,
                                       get(Agt$get_id_col())]

  Transitionv2 <- 
    TransitionClassification$new(
      x = Agt, 
      model = v2_model, 
      targeted_agents = eligible_agent_ids
    )

  # use the simulated result to update the 'state' attribute of the selected agents
  Transitionv2$update_agents(attr = "v2")

  Agt$log(desc = "v2",
          value = xtabs(~ v2, data = Agt$get_data()))

  return(world)
}

event_v3 <- function(world, model) {

  # the `get` function returns a reference to the called object stored in World
  Agt <- world$get("Agent")

  # select only agents that are older than 50 years old
  eligible_agent_ids <- Agt$get_data()[age > 50,
                                       get(Agt$get_id_col())]

  Transitionv3 <- 
    TransitionClassification$new(
      x = Agt, 
      model = v3_model, 
      targeted_agents = eligible_agent_ids
    )

  # use the simulated result to update the 'state' attribute of the selected agents
  Transitionv3$update_agents(attr = "v3")

  Agt$log(desc = "v3",
          value = xtabs(~ v3, data = Agt$get_data()))

  return(world)
}

event_v4 <- function(world, model) {

  # the `get` function returns a reference to the called object stored in World
  Agt <- world$get("Agent")

  # select only agents that are older than 50 years old
  eligible_agent_ids <- Agt$get_data()[age > 50,
                                       get(Agt$get_id_col())]

  Transitionv4 <- 
    TransitionClassification$new(
      x = Agt, 
      model = v4_model, 
      targeted_agents = eligible_agent_ids
    )

  # use the simulated result to update the 'state' attribute of the selected agents
  Transitionv4$update_agents(attr = "v4")

  Agt$log(desc = "v4",
          value = xtabs(~ v4, data = Agt$get_data()))

  return(world)
}

event_age <- function(world, model = NULL) {
  Agt <- world$get("Agent")

  # update age of alive agents, with max of 100.
  Agt$get_data(copy = FALSE) %>%
    .[disability_status != "death" & age < 100, age := age + 1L]

  return(world)
}

# simulation --------------------------------------------------------------
for (i in 1:15) {
  MyWorld$start_iter(i, unit = "year") %>%
    event_v1(., model = v1_model) %>%
    event_v2(., model = v2_model) %>%
    event_v3(., model = v3_model) %>%
    event_v4(., model = v4_model) %>%
    event_disability(., disability_model) %>%
    event_age(.)
}

# visualisation -----------------------------------------------------------
simlog <- get_log(MyWorld)

disability_status_log <- 
  simlog %>%
  .[desc == "disability_status:state", ] %>%
  {purrr::map2_dfr(.$time, .$value, ~ {
    .y %>% 
      as.data.table() %>%
      .[, time := .x]
  })}

ggplot(data = disability_status_log, aes(x = time, y = N, color = disability_status)) +
  geom_line() +
  scale_x_continuous(breaks = scales::pretty_breaks()) +
  labs(title = "Disability status")

Created on 2020-03-14 by the reprex package (v0.3.0)