DSACMS / dedupliFHIR

Prototype for basic deduplication and aggregation of eCQM data
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

Design a suite of Tests for DedupliFHIR #62

Closed IsaacMilarky closed 1 day ago

IsaacMilarky commented 3 weeks ago

Design a suite of Tests for DedupliFHIR

Problem

There is currently only one test written for the DeupliFHIR backend.

Solution

Write a suite of tests to test specific inputs as well as test the efficacy of the underlying Splink algorithm to work on our generated data.

Result

There are now a bunch of new Pytest tests.

Test Plan

Run make test

natalialuzuriaga commented 2 weeks ago

Hmmmm there are 2 tests that are failing on my end:

______________________________________________ test_deduplicate_with_provided_data _______________________________________________

args = ()
kwargs = {'dedup_test_data':                             id  truth_value family_name given_name  ...            city state post...rance_Weber.xml"          NaN       weber   terrance  ...   South Quinton    NE       68843 NaN

[5 rows x 12 columns]}

    @wraps(func)
    def wrapper(*args,**kwargs):
>       fmt = kwargs['fmt']
E       KeyError: 'fmt'

deduplifhirLib/utils.py:140: KeyError
======================================================== warnings summary ========================================================
../.venv/lib/python3.11/site-packages/splink/blocking_rules_library.py:176: 6 warnings
cli/deduplifhirLib/tests.py: 20 warnings
  /Users/natalialuzuriaga/Desktop/ecqm-dedupe/.venv/lib/python3.11/site-packages/splink/blocking_rules_library.py:176: DeprecationWarning: `exact_match_rule` is deprecated; use `block_on`
    em_rules = [_exact_match(col) for col in col_names]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================== short test summary info =====================================================
FAILED deduplifhirLib/tests.py::test_with_different_data_sizes[100-0.1] - splink.exceptions.EMTrainingException: Training rule `(l."street_address" = r."street_address") AND (l."postal_code" = r."pos...
FAILED deduplifhirLib/tests.py::test_deduplicate_with_provided_data - KeyError: 'fmt'
====================================== 2 failed, 6 passed, 26 warnings in 64.16s (0:01:04) =======================================
IsaacMilarky commented 1 day ago

Closing this pr because of the high number of conversations on it