Open andkov opened 8 years ago
@smhofer , here is my commentary on your five sections. I need to introduce a slight modification to account for the way the scripts actually deal with the data. Specifically, I suggest implementing the processes in Section 2 and 3 for each set of harmonized variables separately. It's more practical to organize it this way and it will not change the end result of Section 3 : creation of a combined data set.
The script ./manipulation/0-ellis-island.r
produces a working report ./manipulation/stitched-output/0-ellis-island.md
. This report accomplishes accomplished Section (1), (2b), (3a). I've copiously annotated it and it's meant to be a part of the live documentation. This is where one will go to find out how specifically the processes in section (1), (2a), and (3a) have been implemented.
Note that Section (2a) is accomplished outside of R by editing the file ./data/shared/meta-data-map.csv
. I don't think it's a good idea for projects like these to conduct renaming by hand in script. This is my biggest lesson learned from Portland, so I'd like to gently insist on this.
I'm moving on to developing the scripts to implement Section (3c) for smoking.
dto
containing unmerged, raw unit data from each study and a single metadata file containing metadata for variables from all studies. (e.g. what type of type of variable that is, how the variable should be renamed, etc..)dto
) create datasets that aggregate variables with shared properties of the metadata (e.g. "all variables that have smoking
for the value of the construct
column in the metadata set). smoking
, education
, ect.) transform the raw variables in corresponding dataset to create harmonized variables. Evaluate each harmonized variable separately. (managing a large, combined file during harmonization is inconvenient. in addition, there might be a need/interest to inspect individual files during the process. this makes it easier to provision)study_name
as a factor.
@smhofer prosed the following plan for the reproducible report(s):