IALSA / ialsa-2016-groningen

Maelstrom Harmonization Workshop. Assessing the impact of different harmonization procedures on the analysis results from several real datasets.
GNU General Public License v2.0
1 stars 0 forks source link

Display to describe renaming pattern #4

Closed andkov closed 8 years ago

andkov commented 8 years ago

Following you suggestions, @wibeasley, to exploit an external, modifiable metadata object in data grooming work flow, I have started implementing such solution with https://github.com/IALSA/ialsa-2016-groningen/blob/master/data/shared/names-labels-augmented.xls, which queries individual csvs : https://github.com/IALSA/ialsa-2016-groningen/tree/master/data/shared/derived. During grooming I've stumbled on an interesting micro-quest.

Exposition

The dataset contains three column, containing the old and renamed variables.

study_name <- c( "alsa",  "alsa",  "lbsl",  "lbsl",  "satsa", "satsa", "satsa", "share", "share", "share", "tilda", "tilda", "tilda", "tilda") 
name_new   <-  c("smoke_now", "smoke_pipecigar", "smoke_history"   ,"smoke_now", "smoke_history", "smoke_now" , "snuff_history", "smoke_history", "smoke_now", "smoke_years" , "smoke_age", "smoke_history", "smoke_history2", "smoke_now") 
name_old <-  c("SMOKER","PIPCIGAR","SMOKE","SMK94","GEVRSMK","GSMOKNOW","GEVRSNS","BR0010","BR0020","BR0030","BH003","BH001","BEHSMOKER","BH002")

d <- data.frame(
  study_name = study_name,
  name_new   = name_new,
  name_old =  name_old
  )
d
   study_name        name_new  name_old
1        alsa       smoke_now    SMOKER
2        alsa smoke_pipecigar  PIPCIGAR
3        lbsl   smoke_history     SMOKE
4        lbsl       smoke_now     SMK94
5       satsa   smoke_history   GEVRSMK
6       satsa       smoke_now  GSMOKNOW
7       satsa   snuff_history   GEVRSNS
8       share   smoke_history    BR0010
9       share       smoke_now    BR0020
10      share     smoke_years    BR0030
11      tilda       smoke_age     BH003
12      tilda   smoke_history     BH001
13      tilda  smoke_history2 BEHSMOKER
14      tilda       smoke_now     BH002

Quest

Design such bivariate display that shows the pattern of renaming.

I was thinking something along the lines: table with values of old variable names as row names, new names as columns, and the name of the study in the cell. I'm wrapping up for today and will start on this in spare time.

@Maleeha and @casslbrown, I'm tagging you because this might be of interest. feel free to ignore.

andkov commented 8 years ago

can wait