data-to-insight / csc-validator-be-cin

1 stars 1 forks source link

Bug: Rule 2885 - False positives in CINEpisodes where DateOfInitialCPC is null #367

Closed SLornieCYC closed 1 year ago

SLornieCYC commented 1 year ago

Describe the bug Some CINEpisodes records are being incorrectly flagged against this rule for having a null DateOfInitialCPC, including where the child does not even have any CPP record. Suspect this is a data indexing issue.

Proposed fix (Optional) I think the merge columns are wrong on df_cin_issues. Should be ROW_ID_cin rather than ROW_ID_cpp?

https://github.com/data-to-insight/CIN-validator/blob/dc4559d2362e0efacfe47ebb9b4b3d538c7d0094/cin_validator/rules/cin2022_23/rule_2885.py#L166-L171

tab1tha commented 1 year ago

Hi @SLornieCYC ,

In the case where DateOfInitialCPC is NaN and CPPstartDate is present, is it okay to flag the NaN DateOfInitialCPC since it is not equal to CPPstartDate?

SLornieCYC commented 1 year ago

@tab1tha Yes, if there is a CPPstartDate but no DateOfInitialCPC the NaN DateOfInitialCPC should be flagged.

https://github.com/data-to-insight/CIN-validator/blob/dc4559d2362e0efacfe47ebb9b4b3d538c7d0094/cin_validator/rules/cin2022_23/rule_2885.py#L71-L73

tab1tha commented 1 year ago

Amazing, I think I've fixed it.

This is how it works now: For every CPPstartDate, it selects all the Section47 modules that have the same LAchildID-CINdetailsID as it and checks if any of them have a DateOfInitialCPC that is the same as the CPPstartDate. If it finds one, all instances of that CIN module (LAchildID-CINdetailsID) pass.

If no matching DateOfInitalCPC is found in the Section47 table, it goes to the CINdetails table and does the same thing.

However, matches found in the CIN module are only valid if that LAchildID-CINdetailsID wasn't present in the Section47 table. If a LAchildID-CINdetailsID is present in the Section47 module and no match is found, and we go to the CIN table and find a matching date for it, that match will be ignored and the child will still fail the rule.

This last clause (which ignores some possible passes because the LAchildID-CINdetailsID module is present in section47) can easily be reversed if it is is a wrong interpretation of the logic.

tab1tha commented 1 year ago

The code will be linked up to this conversation when it is ready.

tab1tha commented 1 year ago

This bug is addressed in the attached pull request https://github.com/data-to-insight/CIN-validator/blob/3d25ff2629704ea26a0ba876f8aa5fbbbcc60b4c/cin_validator/rules/cin2022_23/rule_2885.py#L130