data-to-insight / csc-validator-be-cin

1 stars 1 forks source link

Split Assessment Factors into separate dataframe to reduce duplication of error rows passed to FE #475

Closed SLornieCYC closed 2 months ago

SLornieCYC commented 4 months ago

Current

Ingress creates a separate row in the Assessments dataframe for each AssessmentFactors value within a LAchildID-CINdetailsID-AssessmentActualStartDate set. This means that where an assessment has many factors assigned, the assessment is duplicated within the dataset and any in-error assessments are reported multiple times in the FE.

Proposed

Amend ingress to store the assessment factors in a separate dataframe, linked to the parent assessment (and CINdetailsID) via a new internally incremented AssessmentID value. I have ensured that the AssessmentFactors column remains on the Assessments dataframe, but now as a single string containing a list of the factors assigned on the assessment.

The new AssessmentFactorsList dataframe can be used within the relevant validations rules to return the incorrect records, while at the same time being ignored for other validation rules that do not involve assessment factors. This will remove unnecessary duplicate assessment rows from both the validation rules and the FE error lists.

Note: I am not 100% sure how the FE data tables are built and whether some corresponding changes need to be made on that side to handle this change (e.g. do they use the dataframes from ingress or a separate parsing from the uploaded source files).