Bug: Rule 8863 - Duplication in Reports output

SLornieCYC commented 1 year ago

Describe the bug Not quite sure what's going on here but the report output from this rule seems to return an inordinate number of rows for a single error instance compared to the DFE error report csv. Possibly not a huge deal as easy enough to work around; but curious nonetheless.

Even if not something solveable within this rule's coding there might be some benefit to forcing the report to only output distinct rows?

Screenshots This is the output I get from one of my records in the csv export from the DFE error report:

And this is the output I get from the same record in the csv export from the D2I CIN validator:

tab1tha commented 1 year ago

Here, a solution can be to do df.drop_duplicates([LAchildID, rule_code, columns_affected, ROW_ID])when the user_report df is created. Before that, could you confirm that these rows are duplicates. That is, is the LAchilD the same? (that's the only variable whose value I can't see/compare).

SLornieCYC commented 1 year ago

Hi @tab1tha Yes I can confirm all of those rows are for the same child. If I do "remove duplicates" in Excel on those 18 rows (across all 8 columns in the user report) it removes 11 duplicate rows and leaves 7 unique rows.

data-to-insight / csc-validator-be-cin

Bug: Rule 8863 - Duplication in Reports output #374