IALSA / ialsa-2016-amsterdam

Multi-study and multivariate evaluation of healthy life expectancy (HLE): An IALSA workshop on multistate modeling using R
GNU General Public License v2.0
0 stars 0 forks source link

Joining ms-data with the raw covariate #13

Open andkov opened 8 years ago

andkov commented 8 years ago

After creating a multistate variable, the data arrives at the following format

         id fu_point      age state
      (int)    (int)    (dbl) (dbl)
1  33118460        0 78.53799     1
2  33118460        1 79.53730     1
3  33118460        2 80.54757     1
4  33118460        3 81.54415     1
5  33118460        4 82.54346     1
6  33118460        5 83.57837     1
7  33118460        6 84.57221     1
8  33118460        7 85.26215     3
9  43403133        0 83.62765     1
10 43403133        1 85.36893     1
11 52311825        0 78.65572     1
12 52311825        1 79.68789     1
13 52311825        2 80.64339     2
14 52311825        3 81.67830     1
15 52311825        4 82.68309     1
16 52311825        5 83.64134     1
17 52311825        6 84.67077     1
18 52311825        7 85.43190     3

State 3 is a dead state that just has been added by ./manipulation/1-ms-dementia.R. Now we need to augment this restructured data with the covariates from the original data, for example

         id msex fu_year
1  33118460    1       0
2  33118460    1       1
3  33118460    1       2
4  33118460    1       3
5  33118460    1       4
6  33118460    1       5
7  33118460    1       6
8  43403133    1       0
9  43403133    1       1
10 52311825    1       0
11 52311825    1       1
12 52311825    1       2
13 52311825    1       3
14 52311825    1       4
15 52311825    1       5
16 52311825    1       6

Left join to combine them

d <- ds %>% dplyr::left_join(selected_covariates,by=c("id"="id", "fu_point"="fu_year"))

         id fu_point      age state  msex
      (int)    (int)    (dbl) (dbl) (int)
1  33118460        0 78.53799     1     1
2  33118460        1 79.53730     1     1
3  33118460        2 80.54757     1     1
4  33118460        3 81.54415     1     1
5  33118460        4 82.54346     1     1
6  33118460        5 83.57837     1     1
7  33118460        6 84.57221     1     1
8  33118460        7 85.26215     3    NA
9  43403133        0 83.62765     1     1
10 43403133        1 85.36893     1     1
11 52311825        0 78.65572     1     1
12 52311825        1 79.68789     1     1
13 52311825        2 80.64339     2     1
14 52311825        3 81.67830     1     1
15 52311825        4 82.68309     1     1
16 52311825        5 83.64134     1     1
17 52311825        6 84.67077     1     1
18 52311825        7 85.43190     3    NA

After which I replace the NAs with the loop

> stem <- c("id", "fu_point","age","state")
> not_stem <- setdiff(names(d), stem)
> death_state <- max(d$state)
> for(j in not_stem){
+   for(i in 1:nrow(d)){
+     if(is.na(d[i,j]) & d[i,"state"]==death_state ){
+       d[i,j] <- d[i-1,j]
+     }
+   } 
+ }
> 
> d
Source: local data frame [18 x 5]

         id fu_point      age state  msex
      (int)    (int)    (dbl) (dbl) (int)
1  33118460        0 78.53799     1     1
2  33118460        1 79.53730     1     1
3  33118460        2 80.54757     1     1
4  33118460        3 81.54415     1     1
5  33118460        4 82.54346     1     1
6  33118460        5 83.57837     1     1
7  33118460        6 84.57221     1     1
8  33118460        7 85.26215     3     1
9  43403133        0 83.62765     1     1
10 43403133        1 85.36893     1     1
11 52311825        0 78.65572     1     1
12 52311825        1 79.68789     1     1
13 52311825        2 80.64339     2     1
14 52311825        3 81.67830     1     1
15 52311825        4 82.68309     1     1
16 52311825        5 83.64134     1     1
17 52311825        6 84.67077     1     1
18 52311825        7 85.43190     3     1