Closed andkov closed 8 years ago
Everything looks like it's set up well. I'm going to change just a few things.
The real work is completed by this single left join. The rest of the commit's code is just clean up.
# Join the model data frame to the conversion data frame.
ds <- ds %>%
dplyr::left_join(ds_model_type_key, by=c("model_type"="entry"))
I believe the first table is what you were going for above. The second tables is essentially a transition matrix (from the old names, to the cleaned/condensed categories).
> t <- table(ds$category_short, ds$study_name);t[t==0]<-".";t
eas elsa hrs ilse lasa nuage octo radc satsa
0 2 . . 4 . 10 . . 20
a 56 . 24 14 . 16 72 113 34
ae 58 . . . . . . 109 34
aeh 57 . 24 14 . 16 72 116 34
aehplus 58 18 28 25 18 16 58 113 40
full 58 . . . . . 4 . .
> t <- table(ds$model_type, ds$category_short);t[t==0]<-".";t
0 a ae aeh aehplus full
0 20 . . . . .
a . 133 . . . .
ae . . 201 . . .
aeh . . . 333 . .
aehplus . . . . 373 .
age . 196 . . . .
aheplus . . . . 1 .
empty 16 . . . . .
full . . . . . 62
Ah, good! Thanks, @wibeasley. This set up is certainly more welcoming to the non-coding crew.
I wouldn't reach for the joins to do the work, so i'm glad you've shown this. I hoped there was a one line solution.
I like the new table, it's quite informative. It makes debugging easer.
I'll work through the rest of the items. I may need help when I get to incorporating sorting (into domains) into joins. Thanks again!
I'll work through the rest of the items. I may need help when I get to incorporating sorting (into domains) into joins. Thanks again!
No problem. Just tell me when.
I like the new table, it's quite informative. It makes debugging easier.
Yeah, I considered stringing together those dplyr statements, but it would have prevented us from peeking at the transition matrix.
And I like the format of a transition matrix. I usually use something like dplyr::count()
for real tallying. But the table()
display lets you see the "off diagonals" better. I like your touch replacing the zeros with a dot.
Just a thought – can we assume
1) aeg is not aeh?
2) ahe is not age?
H and G are typed with different fingers, but are right next to each other on the keyboard. I guess there is a context that determines it – i.e., aeh show up in the filename and age is a covariate?
good point, @ampiccinin . we are not protected from errors like that. We won't be able to catch the mistake here if the name of the file came with that misspelling. However, when we'll look for fixed effects this mistake will be apparent and we'll have to come back, locate the file, and add a line to the .csv to rename the outcomes to correct for this misspelling. Yes, such misspellings are very costly to debug.
The idea of this .csv
file is that it would offer an easy way to edit these corrections. column category_short
contains what entry
will be renamed into, while column notes
gives explanation for this substitution. It's hard to anticipate ALL possible misspellings, as as you showed in your example, the interpretation may be highly contextual. So I think our strategy would be: watch our for things that don't make sense and edit the .csv by entering additional renaming rules.
We'll have a modification of this csv for classifying into domains as well. This is how we can make it customizable to every track, while keeping the bulk of the code stable across tracks (and projects).
@wibeasley , please take a look at the example i've developed in
./manipulation/rename-classify.R
. In stead of cognitive outcome, i've chosen a simpler case, model type. We don't have to deal with classification yet.I've created a .csv file
./manipulation/model_type-entry-table.csv
containing the instructions for renaming../manipulation/rename-classify.R
first looks at the values across studies at line 43and then conducts re-assignment in lines 47-52
but doesn't really do it, because
@wibeasley , is this set up what you originally proposed? I'd like to get this simple case first, before moving on to a more populous cases, such as cognitive_measure. If yes, what am I missing here to make it work?