IALSA / HRS

Shaping data from the Health and Retirement Study.
GNU General Public License v2.0
5 stars 2 forks source link

Merging multiple files #8

Open andkov opened 7 years ago

andkov commented 7 years ago

The dto as it emerges from the 1-scale-assembly has a data frame as each element. Each data frame is long with respect to time. Each data set has variables year and hhidpn.

Objective

prodice a single, flat data set that combines those in the list

andkov commented 7 years ago

@casslbrown

First, we need to create a subset of the list dto that would contain ONLY the items we want to merge:

dto_new <- list()

dto_new[["demographics"]] <- dto$demographics %>% 
  dplyr::select(year, hhidpn, birthyr, interview_yr,male, race )

dto_new[["loneliness"]] <- dto$loneliness %>% 
  dplyr::select(year,hhidpn,score_loneliness_3, score_loneliness_11  )

dto_new[["life_satisfaction"]] <- dto$life_satisfaction %>% 
  dplyr::select(year, hhidpn, sum, mean) %>% 
  dplyr::rename(
     life_sat_sum = sum
    ,life_sat_mean = mean
  )

The script above shows how to create a dto_new that would have the same subsection, but but would prune the unnecessary item. Where, necessary, you must rename the columns, so that their name are unique in the global file (e.g. if more than one scale has columns sum and mean you must be give it unique names, e.g. life_satisfaction_sum' andlife_satifsfaction_mean`)

The snapshot of the produced dataframes is displayed below. image

Notice, how each data frame has two identical columns : year and hhidpn. This is important becaue we will join these data frames by these columns. Use the following function to merge multiple data frames by the same key:

merge_mulitple_files <- function(list, by_columns){
  Reduce(function( d_1, d_2 ) merge(d_1, d_2, by=by_columns), list)
}
ds <- merge_mulitple_files(dto_new, by_columns = c("year","hhidpn"))

The snapshot of the created data frame appears below. image