Closed yongrenjie closed 7 months ago
@helendduncan: This is the spec we talked about earlier, but I've renamed COMBINE to COMBINE_LINEAR as Louis suggested
So the use case in this example would be something like: **Edited RE comment below
filenames <- c("../example_data/random_ae_data.csv", "../example_data/random_pis_data.csv")
all_feature_json_filenames <- c("../example_spec/combine_example.json")
tf <- transform(all_table_filenames = filenames, all_feature_json_filenames = all_feature_json_filenames)
if we assume the example spec looks something like this:
{
"transformation_type": "COMBINE_LINEAR",
"output_feature_name": "sum_of_feature_1_and_feature_2",
"grouping_columns": ["id"],
"feature_list": {
"feature_1": {
"weight": 1,
"source_file": ["../example_data/random_ae_data.csv"],
"transformation_type": "COUNT",
"absent_data_flag": 0,
"primary_filter": {
"column": "attendance_category",
"type": "IN",
"value": [1]
}
},
"feature_2": {
"weight": 1,
"source_file": ["../example_data/random_pis_data.csv"],
"transformation_type": "COUNT",
"absent_data_flag": 0,
"primary_filter": {
"column": "bnf_section",
"type": "IN",
"value": [106]
}
}
}
}
Yup, although we don't need to call read_all_tables()
, we just pass the list of filenames to transform()
.
I extended the PR a bit to include ways to calculate the minimum / maximum of two or more features too, as discussed in this morning's meeting!
Superseded by #55
Closes #43
Closes #50