DSACMS / dedupliFHIR

Prototype for basic deduplication and aggregation of eCQM data
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

Allow Splink Linker Settings to be Configured #46

Closed IsaacMilarky closed 1 month ago

IsaacMilarky commented 2 months ago

Is your feature request related to a problem? Please describe. Right now dedupliFHIR only supports one static definition of blocking rules and comparison rules. This limits the use-cases and functionality of the application.

Describe the intended behavior The user should be able to define custom linker settings that persist and are loaded at start.

This way the user has more control over the application and can make the application work more efficiently if they know advanced information about their data.

Describe the solution you'd like The user should be allowed to define a custom linker_settings.json file that contains all of the desired linker settings

Example:


{
    "link_type": "dedupe_only",
    "blocking_rules_to_generate_predictions": [
        block_on( "birth_date"),
        block_on(["ssn", "birth_date"]),
        block_on(["ssn", "street_address"]),
        block_on("phone"),
    ],
    "comparisons": [
        ctl.name_comparison("given_name", term_frequency_adjustments=True),
        ctl.name_comparison("family_name", term_frequency_adjustments=True),
        ctl.date_comparison("birth_date", cast_strings_to_date=True, invalid_dates_as_null=True),
        ctl.postcode_comparison("postal_code"),
        cl.exact_match("street_address", term_frequency_adjustments=True),
        cl.exact_match("phone",  term_frequency_adjustments=True),
    ],
    "retain_matching_columns": True,
    "retain_intermediate_calculation_columns": True,
    "max_iterations": 20,
    "em_convergence": 0.01
}

Describe alternatives you've considered We could instead configure settings live in the front-end, or add command line options for additional use-cases.

However, this would be worse because it limits the user unnecessarily.

Additionally, we could allow the front-end to edit the settings file to get the best of both worlds.

Additional context This is related to feedback from coctavius@mdinteractive.com