Open andkov opened 7 years ago
@andkov, for the renaming part of the script (currently at line 172), consider pulling that out intoa metadata csv with three columns: name_old
, name_new
, and comments
.
It may not be worth messing with now, unless there are multiple name_old
s that map to a single name_new
. For instance, say one of the scripts produces aa_TAU_00_est
, while another (renegade set of scripts had produced aa_TAU_est_00
. Assuming a third set of scripts didn't use both aa_TAU_00_est
and aa_TAU_est_00
, this should work.
Good point, thank you, @wibeasley. I would very much like a registry of names of model components. This would especially be useful for different tiers of coordination:
The next work-through of the existing scripts will help me identify where the renaming you've mentioned should be the most organic.
Cool. Then here's a regex script that will pull out those values and put them into a CSV. Copy & paste the meat of that dplyr::rename()
snippet so it looks like:
column_renames <- '
# general model information
"study_name" = "`study_name`"
, "model_number" = "`model_number`"
, "subgroup" = "`subgroup`"
, "model_type" = "`model_type`"
...
, "b_gamma_16_se" = "`b_GAMMA_16_se`"
, "b_gamma_16_wald" = "`b_GAMMA_16_wald`"
, "b_gamma_16_pval" = "`b_GAMMA_16_pval`"
'
Then run this and rename/move the column-renames.csv
in some metadata directory.
pattern <- '(?s).+?"(\\w+)"\\s+=\\s*"`(\\w+)`".*?'
rearranged <- gsub(pattern, "\\2,\\1,\n", column_renames, perl=TRUE)
rearranged
ds <- rearranged %>%
readr::read_csv(, col_names = c("name_old", "name_new", "comments"))
readr::write_csv(ds, "./column-renames.csv")
This is a handy little script for converting code into proper metadata. I'm surprised we haven't need to write something like this yet.
This is the code that should work (I haven't tested it) when you read the metadata and apply the column name changes.
ds <- readr::read_csv("./column-renames.csv")
renaming_vector <- ds$name_old
names(renaming_vector) <- ds$name_new
ds_names_new <- ds_names_old %>%
dplyr::rename_(.dots = renaming_vector)
edit:: and don't be afraid to add extra columns to this, if it helps anything.
Great regex example for studying. I've finally got over the initial scare of using it and can learn more elaborate applications. Can't imagine an efficient data manipulations without regexes anymore. Thanks for pushing me down that hill!
Currently these two tasks are accomplished by a single script `./manipulation/rename-classify.R.
Such practice is far from optimal for the following reasons:
For these and other reasons, it is advisable to develop a function that would take in a catalog and and the external csv with grouping instructions, so that this procedure could be applied immediately before table or graph production and NOT during the manipulation phase.