implement COMBINE_{LINEAR,MIN,MAX}

yongrenjie commented 8 months ago

Closes #43

Closes #50

yongrenjie commented 8 months ago

@helendduncan: This is the spec we talked about earlier, but I've renamed COMBINE to COMBINE_LINEAR as Louis suggested

helendduncan commented 8 months ago

So the use case in this example would be something like: **Edited RE comment below

filenames <- c("../example_data/random_ae_data.csv", "../example_data/random_pis_data.csv")
all_feature_json_filenames <- c("../example_spec/combine_example.json")
tf <- transform(all_table_filenames = filenames, all_feature_json_filenames = all_feature_json_filenames)

if we assume the example spec looks something like this:

{
    "transformation_type": "COMBINE_LINEAR",
    "output_feature_name": "sum_of_feature_1_and_feature_2",
    "grouping_columns": ["id"],
    "feature_list": {
        "feature_1": {
            "weight": 1,
            "source_file": ["../example_data/random_ae_data.csv"],
            "transformation_type": "COUNT",
            "absent_data_flag": 0,
            "primary_filter": {
                "column": "attendance_category",
                "type": "IN",
                "value": [1]
            }
        },
        "feature_2": {
            "weight": 1,
            "source_file": ["../example_data/random_pis_data.csv"],
            "transformation_type": "COUNT",
            "absent_data_flag": 0,
            "primary_filter": {
                "column": "bnf_section",
                "type": "IN",
                "value": [106]
            }
        }
    }
}

yongrenjie commented 8 months ago

Yup, although we don't need to call read_all_tables(), we just pass the list of filenames to transform().

I extended the PR a bit to include ways to calculate the minimum / maximum of two or more features too, as discussed in this morning's meeting!

yongrenjie commented 7 months ago

Superseded by #55

alan-turing-institute / eider

implement COMBINE_{LINEAR,MIN,MAX} #44