Open jfreedman-svg opened 4 years ago
I made some edits so that your question is more readable.
So this is quite long but I'm going to walk you through the data and model preparation step to running your microsimulation model. Please also refer to the documentation at https://core.dymium.org/index.html for parts that are not clear. If they are still unclear then let me know I will try to explain the best I can. :)
For the record, we also setup a repo at https://github.com/asiripanich/microsim-disability for your project which has a minimal working example of your model.
To link those models together you just need to make sure that you have a population data with those attributes, all the eight covariates.
library(data.table)
n_agents <- 100
population_data <-
data.table::data.table(
id = 1:n_agents,
age = sample(c(0:100), n_agents, replace = TRUE),
sex = sample(c('male', 'female'), n_agents, replace = TRUE),
education = sample(c('high school and below', 'diploma', 'university level'), n_agents, replace = TRUE),
v1 = sample(c(0:1), n_agents, replace = TRUE),
v2 = sample(c(0:1), n_agents, replace = TRUE),
v3 = sample(c(0:1), n_agents, replace = TRUE),
v4 = sample(c(0:1), n_agents, replace = TRUE),
disability_status = sample(c('state_1', 'state_2', 'state_3', 'state_4'), n_agents, replace = TRUE)
)
If you haven't estimate your transition models then you can use the caret
package to do that. In your case, you can use the glm
function from the stats
package to estimate logistic regression models and the nnet
package to estimate a multinomial logit model. dymiumCore
can use these models estimated using the caret
package for microsimulation without you having to extract the parameters. This is very convenient if you often need to re-estimate your model.
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
logit_model <-
train(
v1 ~ age + sex,
data = population_data,
method = "glm",
family = binomial(link = "logit")
)
summary(logit_model)
#>
#> Call:
#> NULL
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.3466 -1.1190 -0.9599 1.1933 1.4212
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.040547 0.476980 0.085 0.932
#> age -0.006034 0.007009 -0.861 0.389
#> sexmale 0.366853 0.406192 0.903 0.366
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 138.47 on 99 degrees of freedom
#> Residual deviance: 136.70 on 97 degrees of freedom
#> AIC: 142.7
#>
#> Number of Fisher Scoring iterations: 4
Created on 2020-03-14 by the reprex package (v0.3.0)
Now that you have your population data, models, and event functions, you want to create a World object to store your population data. World is responsible for storing data and model objects that are required for your microsimulation model.
# create a World object
MyWorld <- World$new()
# create an Agent object
MyAgent <- Agent$new(.data = population_data, id_col = "id")
# add the Agent object to the world object.
MyWorld $add(MyAgent)
# As you can see the world object only contain 1 item which is the Agent object that we have just added
world
#> There are 1 items in self$Cont.
#> Class: Agent
#> Inheritance: Agent <- Entity <- Generic <- R6
#> Number_of_entities: 100
#> Number_of_removed_entities: 0
#> Data[rows, cols]: attrs[100, 9]
# to access the Agent object we can use the `entities` field which is a named list
# Note that, all Entity inheritance classes, including Agent, can be refer to by its class name.
world$entities$Agent
#> Class: Agent
#> Inheritance: Agent <- Entity <- Generic <- R6
#> Number_of_entities: 100
#> Number_of_removed_entities: 0
# or alternatively you can use the `get` method and refer to the Agent object by its name which is 'Agent'
MyWorld$get("Agent")
#> Class: Agent
#> Inheritance: Agent <- Entity <- Generic <- R6
#> Number_of_entities: 100
#> Number_of_removed_entities: 0
#> Data[rows, cols]: attrs[100, 9]
# the `get` function returns a reference to the called object within World
Agt <- MyWorld$get("Agent")
# to view Agent's attribute data use the `get_data` method
Agt$get_data()
#> Data[rows, cols]: attrs[100, 9]
#> id age sex education v1 v2 v3 v4 disability_status
#> 1: 1 94 female high school and below yes yes no no state_1
#> 2: 2 78 male university level no yes no yes state_1
#> 3: 3 6 female diploma no no yes yes state_1
#> 4: 4 32 female high school and below yes yes no yes state_2
#> 5: 5 79 female high school and below no yes no yes state_4
#> 6: 6 83 female university level yes no yes no state_2
Notice that, Agt
and MyAgent
are pointing to the same population data. You can try use get_data()
on both variables and they will return the exact same data.
Created on 2020-03-14 by the reprex package (v0.3.0)
Now to simulate transitions between different states, it is recommended that you create a function for each transition like below. An event function should except at least two inputs. The first input is a World object and the second input is a model object that you want to use to predict transition probabilities.
event_disability <- function(world, model) {
# the `get` function returns a reference to the called object stored in World
Agt <- world$get("Agent")
# select only agents that are older than 50 years old and not in disability state 4
eligible_agent_ids <- Agt$get_data()[age > 50 & state != "state_4",
get(Agt$get_id_col())]
TransitionDisability <-
TransitionClassification$new(
x = Agt,
model = model,
targeted_agents = eligible_agent_ids
)
# use the simulated result to update the 'state' attribute of the selected agents
TransitionDisability$update_agents(attr = "disability_status")
Agt$log(desc = "disability_status:state",
value = xtabs(~ state, data = Agt$get_data()))
return(world)
}
event_v1 <- function(world, model) {...}
event_v2 <- function(world, model) {...}
event_v3 <- function(world, model) {...}
event_v4 <- function(world, model) {...}
You can repeat the whole process for several times to obtain the simulation results from multiple independent runs to create a confidence interval.
for (i in 1:15) {
MyWorld$start_iter(i, unit = "year") %>%
event_v1(., model = v1_model) %>%
event_v2(., model = v2_model) %>%
event_v3(., model = v3_model) %>%
event_v4(., model = v4_model) %>%
event_disability(., disability_model)
}
Here is a fixed version to the toy example you shared with me. I removed the adding new entrants script that you put inside the simulation pipeline. You should instead create that as an event like how we did for the other events.
library(data.table)
library(dymiumCore)
#> ── * dymium's options * ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ● dymiun.scenario_dir: /var/folders/0d/9srpj_750lxbkfs2_8nwkcpw0000gn/T//Rtmp2iIk31/scenario
#> ● dymiun.input_dir: /var/folders/0d/9srpj_750lxbkfs2_8nwkcpw0000gn/T//Rtmp2iIk31/scenario/inputs
#> ● dymiun.output_dir: /var/folders/0d/9srpj_750lxbkfs2_8nwkcpw0000gn/T//Rtmp2iIk31/scenario/outputs
library(testthat)
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(ggplot2)
set.seed(1)
# agent data --------------------------------------------------------------
n_agents <- 1000
population_data <-
data.table::data.table(
id = 1:n_agents,
age = sample(c(0:100), n_agents, replace = TRUE),
sex = sample(c('male', 'female'), n_agents, replace = TRUE),
education = sample(c('high school and below', 'diploma', 'university level'), n_agents, replace = TRUE),
v1 = sample(c('no', 'yes'), n_agents, replace = TRUE),
v2 = sample(c('no', 'yes'), n_agents, replace = TRUE),
v3 = sample(c('no', 'yes'), n_agents, replace = TRUE),
v4 = sample(c('no', 'yes'), n_agents, replace = TRUE),
disability_status = sample(c('state_1', 'state_2', 'state_3', 'state_4'), n_agents, replace = TRUE)
)
# models ------------------------------------------------------------------
v1_model <-
train(
v1 ~ age + sex,
data = population_data,
method = "glm",
family = binomial(link = "logit")
)
v2_model <-
train(
v2 ~ age + sex + v1,
data = population_data,
method = "glm",
family = binomial(link = "logit")
)
v3_model <-
train(
v3 ~ age + sex + v1 + v2,
data = population_data,
method = "glm",
family = binomial(link = "logit")
)
v4_model <-
train(
v4 ~ age + sex + v1 + v2 + v3,
data = population_data,
method = "glm",
family = binomial(link = "logit")
)
disability_model <-
train(
disability_status ~ age + sex + v1 + v2 + v3 + v4,
data = population_data,
method = "multinom",
family = binomial(link = "logit")
)
#> # weights: 32 (21 variable)
#> initial value 1386.294361
#> iter 10 value 1377.042417
#> iter 20 value 1375.550262
#> final value 1375.531152
#> ...
#> # weights: 32 (21 variable)
#> initial value 1386.294361
#> iter 10 value 1377.562496
#> iter 20 value 1377.042549
#> final value 1377.029763
#> converged
# create world ------------------------------------------------------------
# create a World object
MyWorld <- World$new()
# create an Agent object
MyAgent <- Agent$new(.data = population_data, id_col = "id")
# add the Agent object to the world object.
MyWorld$add(MyAgent)
# events ------------------------------------------------------------------
event_disability <- function(world, model) {
# the `get` function returns a reference to the called object stored in World
Agt <- world$get("Agent")
# select only agents that are older than 50 years old and not in disability state 4
eligible_agent_ids <- Agt$get_data()[age > 50 & disability_status != "state_4",
get(Agt$get_id_col())]
TransitionDisability <-
TransitionClassification$new(
x = Agt,
model = disability_model,
targeted_agents = eligible_agent_ids
)
# use the simulated result to update the 'state' attribute of the selected agents
TransitionDisability$update_agents(attr = "disability_status")
Agt$log(desc = "disability_status:state",
value = xtabs(~ disability_status, data = Agt$get_data()))
return(world)
}
event_v1 <- function(world, model) {
# the `get` function returns a reference to the called object stored in World
Agt <- world$get("Agent")
# select only agents that are older than 50 years old
eligible_agent_ids <- Agt$get_data()[age > 50,
get(Agt$get_id_col())]
TransitionV1 <-
TransitionClassification$new(
x = Agt,
model = v1_model,
targeted_agents = eligible_agent_ids
)
# use the simulated result to update the 'state' attribute of the selected agents
TransitionV1$update_agents(attr = "v1")
Agt$log(desc = "v1",
value = xtabs(~ v1, data = Agt$get_data()))
return(world)
}
event_v2 <- function(world, model) {
# the `get` function returns a reference to the called object stored in World
Agt <- world$get("Agent")
# select only agents that are older than 50 years old
eligible_agent_ids <- Agt$get_data()[age > 50,
get(Agt$get_id_col())]
Transitionv2 <-
TransitionClassification$new(
x = Agt,
model = v2_model,
targeted_agents = eligible_agent_ids
)
# use the simulated result to update the 'state' attribute of the selected agents
Transitionv2$update_agents(attr = "v2")
Agt$log(desc = "v2",
value = xtabs(~ v2, data = Agt$get_data()))
return(world)
}
event_v3 <- function(world, model) {
# the `get` function returns a reference to the called object stored in World
Agt <- world$get("Agent")
# select only agents that are older than 50 years old
eligible_agent_ids <- Agt$get_data()[age > 50,
get(Agt$get_id_col())]
Transitionv3 <-
TransitionClassification$new(
x = Agt,
model = v3_model,
targeted_agents = eligible_agent_ids
)
# use the simulated result to update the 'state' attribute of the selected agents
Transitionv3$update_agents(attr = "v3")
Agt$log(desc = "v3",
value = xtabs(~ v3, data = Agt$get_data()))
return(world)
}
event_v4 <- function(world, model) {
# the `get` function returns a reference to the called object stored in World
Agt <- world$get("Agent")
# select only agents that are older than 50 years old
eligible_agent_ids <- Agt$get_data()[age > 50,
get(Agt$get_id_col())]
Transitionv4 <-
TransitionClassification$new(
x = Agt,
model = v4_model,
targeted_agents = eligible_agent_ids
)
# use the simulated result to update the 'state' attribute of the selected agents
Transitionv4$update_agents(attr = "v4")
Agt$log(desc = "v4",
value = xtabs(~ v4, data = Agt$get_data()))
return(world)
}
event_age <- function(world, model = NULL) {
Agt <- world$get("Agent")
# update age of alive agents, with max of 100.
Agt$get_data(copy = FALSE) %>%
.[disability_status != "death" & age < 100, age := age + 1L]
return(world)
}
# simulation --------------------------------------------------------------
for (i in 1:15) {
MyWorld$start_iter(i, unit = "year") %>%
event_v1(., model = v1_model) %>%
event_v2(., model = v2_model) %>%
event_v3(., model = v3_model) %>%
event_v4(., model = v4_model) %>%
event_disability(., disability_model) %>%
event_age(.)
}
# visualisation -----------------------------------------------------------
simlog <- get_log(MyWorld)
disability_status_log <-
simlog %>%
.[desc == "disability_status:state", ] %>%
{purrr::map2_dfr(.$time, .$value, ~ {
.y %>%
as.data.table() %>%
.[, time := .x]
})}
ggplot(data = disability_status_log, aes(x = time, y = N, color = disability_status)) +
geom_line() +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Disability status")
Created on 2020-03-14 by the reprex package (v0.3.0)
Using a cleaned person-period file, estimate transition probabilities for each time-varying covariate of interest and the 4 states.
Transition matrices are calculated and stored. Now, generate a starting population of size N using a bootstrapped approach.
We have our transition matrices and our starting population. Now, we simulate. *In the complex case, this will be done for each of the time-varying covariates and the ultimately the 4-state model.
This process repeats itself for 15 simulation cycles (years)
Output: Aggregated counts in each state. Dis-aggregated counts by covariates.