IALSA / ialsa-2016-groningen

Maelstrom Harmonization Workshop. Assessing the impact of different harmonization procedures on the analysis results from several real datasets.
GNU General Public License v2.0
1 stars 0 forks source link

Study: applying harmonization rules #7

Closed andkov closed 8 years ago

andkov commented 8 years ago

Exposition

Consider 10 individuals provided responses on 2 measures, raw1 and raw2, which we need to transform into a harmonized variable h1.

(ds <- data.frame("id" = 1:10,
                  "raw1" = c(1,1,0,0,NA,NA,1 ,0 ,NA,1),
                  "raw2" = c(1,0,1,0,1 ,0 ,NA,NA,NA,1)))

   id raw1 raw2
1   1    1    1
2   2    1    0
3   3    0    1
4   4    0    0
5   5   NA    1
6   6   NA    0
7   7    1   NA
8   8    0   NA
9   9   NA   NA
10 10    1    1

First we compute the response profile to account for all possible combination of responses:

(response_profile <- ds %>% dplyr::group_by(raw1, raw2) %>% dplyr::summarize(count=n()))

Source: local data frame [9 x 3]
Groups: raw1 [?]

   raw1  raw2 count
  (dbl) (dbl) (int)
1     0     0     1
2     0     1     1
3     0    NA     1
4     1     0     1
5     1     1     2
6     1    NA     1
7    NA     0     1
8    NA     1     1
9    NA    NA     1

Then we add harmonization rule : logical instructions what value should a specific pattern of responses in raw1 and raw2 generate in the harmonized variable ds$h1.

(hrule <- cbind(response_profile, "h1" = c(0,1,0,1,1,1,0,1,NA)))

  raw1 raw2 count h1
1    0    0     1  0
2    0    1     1  1
3    0   NA     1  0
4    1    0     1  1
5    1    1     2  1
6    1   NA     1  1
7   NA    0     1  0
8   NA    1     1  1
9   NA   NA     1 NA

Quest

Given data ds compute a new variable ds$h1 from ds$raw1 and ds$raw2 according to the harmonization rule specified in the object hrule. Here is what I currently have

new_function <- function(ds, hrule,
                         variable_names, # variable_names = c("raw1,"raw2") # the number will vary
                         harmony_name # harmony_name = "h1" # computed variable
){
  num_rule_elements <- length(variable_names)
  d <- ds[,variable_names]

}

@wibeasley , please give me a hint for an elegant solution if you see one. I'll just be look for a solution, because this is my bottleneck at the moment.

Starter code

(ds <- data.frame("id" = 1:10,
                  "raw1" = c(1,1,0,0,NA,NA,1 ,0 ,NA,1),
                  "raw2" = c(1,0,1,0,1 ,0 ,NA,NA,NA,1)))
(response_profile <- ds %>% dplyr::group_by(raw1, raw2) %>% dplyr::summarize(count=n()))
(hrule <- cbind(response_profile, "h1" = c(0,1,0,1,1,1,0,1,NA)))
new_function <- function(ds, hrule,
                         variable_names, # variable_names = c("raw1,"raw2") # the number will vary
                         harmony_name # harmony_name = "h1" # computed variable
){
  num_rule_elements <- length(variable_names)
  d <- ds[,variable_names]

}
andkov commented 8 years ago

See http://stackoverflow.com/questions/36273249/recoding-multiple-variables-based-on-logical-rules-in-external-table for solution