data-to-insight / quality-lac-data-beta

Quality LAC data Beta project
MIT License
8 stars 3 forks source link

Version control of validation rules #56

Open SLornieCYC opened 2 years ago

SLornieCYC commented 2 years ago

[Not quite sure whether this belongs on this repo or the validator side, but putting it here as seems more related to overall architecture of the tool]

What is the long-term plan for maintenance of the rules within the tool going forward? Is it the intention that the tool will always reflect the current ruleset for the latest 903 return year, or accommodate the differences applicable to different years? At the moment the front page has a drop-down selection for return year with five options yet each of those years was subject to a separate subset of validation rules.

Although most rules stay the same year on year or only receive slight amendments to fix errors, other rule changes (or additions and removals) may be more significant. As rules change between years and versions of the DFE guidance it seems like there will be a need for some form of versioning here to maintain the accuracy of the tool.

Even if the aim it only to reflect the current position there are likely to be points in the year where more than one set of rules will be required - e.g. designing/updating rules for the forthcoming year ready for use on 1 April while at the same time needing to test current year data against existing rules.

Proposal: each rule should have (optional) additional metadata to record valid-from and valid-to dates (or versions).

This could be tied to the applicable return year and the tool would then filter the rules based on the valid-from and valid-to metadata and only execute those applicable to the year/ruleset selected on the front page. Return year seems more straightforward for this than pure dates or edition of the DFE guidance (i.e. 2020 v1.3 is more up-to-date than 2021 v1.1).

dezog commented 2 years ago

This is a pretty big question that we'll have to address at some point. As you say, it would require adding a bit of complexity to the Python code (and handling of that on the JS side) to keep track of the different rules for each return year.

There's a decision to be made on whether to build this capability into the tool (more work initially, but a cleaner solution if we get it right), or just fork the tool when new guidelines come out and then have a separate version with the new rules (more straightforward to do, but more cumbersome to maintain).

It would be good to get some input from you and the other analysts on how exactly it would be best for this to work in practice, e.g. will anyone need to check against, say, 2016's validation rules, given those returns have already been submitted?

tab1tha commented 1 year ago

Since this has come up again https://github.com/SocialFinanceDigitalLabs/quality-lac-data-beta-validator/issues/593 , we're considering building the tools in a way that allows the user to select the validation rule set based on the year in which they were published/updated.

Is this something that the DfE portal allows you to do?. Are you able to choose to check your data based on past rules?. @SLornieCYC

SLornieCYC commented 1 year ago

DFE portal is only available to us for the current return year with the corresponding set of validations, so from a return perspective no we are only ever looking at the current set of rules.

However here with the tool you've got the option already to load data files for different years and LAs will hopefully be using the tool throughout the year rather than only during return season.

tab1tha commented 1 year ago

That makes sense. Let's build this feature in the CIN tool, then we can move it to 903 too. It's a useful addition especially because these tools enable analysts to have all-year-round data validation, as you pointed out.

We spoke about it earlier but didn't see it as a necessity until now.