data-to-insight / csc-validator-be-cin

1 stars 1 forks source link

bugfix 8842Q - False positives #400

Closed WillLP-code closed 1 year ago

WillLP-code commented 1 year ago

8842Q says:

'Where a <Reviews> group is present, a valid <CPPreviewdate> (N00116) should be present within the group'

@SLornieCYC @tab1tha I'm tagging you both in this as I think it needs some thought. First, a TL;DR: DFE XML generator builds Review blocks as empty where there is no CPPreviewDate, the DFE validator also lets empty Review blocks pass 8842Q -thus the DFE validator won't fail data where the original data had no CPPreviewDate.

Now in detail:

8842Q raises false positives where XMLs have a self closing block with no CPPreviewDate.

Lets start from the beginning to explain why this happens:

1 . When using the DFE XML generator, the generator will make things called short_empty_elements where a block would be, but there's nothing to go in it. These are self closing elements and look like <Reviews/> rather than <Reviews></Reviews>.

  1. Where there is no CPPreviewDate, instead of not making a Reviews block, the DFE XML generator will make an empty, self closing block. Remember this bit: the DFE makes self closing blocks where there is no CPPreviewDate.
  2. When we use ET.findall(), this finds all Reviews blocks, regardless of whether they are self-closing/empty. This means that for every child where the DFE generator makes an empty Reviews block, we give them a reviews block that comes back empty, including CPPreviewDate. BUT, for us this was an intended feature. Empty blocks should fail 8842Q, right?
  3. Our coding of 8842Q currently checks to see if children have reviews blocks with an empty CPPreviewDate BUT, obviously, the way ingress works, this will raise false positives according to the DFE generator as children with empty reviews blocks made by the DFE generator will not have CPPreviewDates.
  4. However, the DFE validator does not pick up these empty reviews blocks as having no CPPreviewDate.
  5. Now, here's the tricky bit: 8842Q looks for Reviews blocks with no CPPreviewDate, BUT the DFE generator builds the XML with empty Reviews blocks where there is no CPPreviewDate AND lets these pass.
  6. SO, either the DFE is letting empty CPPreviewDate blocks pass when it shouldn't because, when building them, it builds them in a way that tells the validator to ignore that they're empty, meaning the DFE is at fault, OR, the rule isn't meant to check if there is a CPPreviewDate and instead is mean to check if the formatting of the CPPreviewDate is valid where it exists, but in that case the wording is totally wrong, OR I'm wrong about this all and there's a simple fix.

Now, we can avoid building these Review blocks for children with an empty Reviews block by using the XPath [tag] syntax, replacing [tag] with [CPPreviewDate]. This means that it will only build Review blocks for children where there is a CPPreviewDate child to the Reviews block. That's what I've done in this PR. This, as afar as I can see, mirrors what the DFE validator does and only builds review blocks to validate where it would have built a review block in the fist place: where there is a CPPreviewDate. However, this looks really really silly, based on our current interpretation of the rule, if, in cases where a Reviews block has no CPPreviewDate, isn't that what the rule is supposed to check? Surely then this way of coding things would not raise these issues? That's right, but we might be interpreting the rule right.

It's possible to interpret the rule as, instead of saying 'Is there a CPPreviewDate', instead it's saying, if there's a non-empty review block, is the CPPreviewDate valid/right? The reasoning behind this interpretation is that the DFE XML generator, where there is no CPPreviewDate, will make a self closing element, and will then not raise the error. So, where LAs have submitted data to the XML generator where there is no CPPreviewDate, the XML generator will generate the self closing block, these blocks are passed over in the DFEs generator, and thus, where there is no CPPreviewDate, the DFE validator will never fail it.

The DFE guidance on what's right is here:

1.11.1 Date fields This guide assumes that each management information system (MIS) in use within local authorities will have standard conventions for recording dates with which users will be familiar. However, the XML format for the children in need census defines all dates as being in the format ‘CCYY-MM-DD’, in accordance with the XML standard. The export functionality for any system will therefore have to convert any dates into this format. Any local authority which makes its own software arrangements, rather than using a commercial system, should take this into account

(potentially) closes #384

tab1tha commented 1 year ago

Hi Will, Thank you for this level of detail. Let me push a release out and then I can go through this a bit later.

SLornieCYC commented 1 year ago

Might be one to revisit in the DFE portal once it opens again in April for the live return (the familiarisation blade closed last week) to properly test different tag combinations and see whether there is any difference in how they are treated by the validation rule.

E.g. <Reviews/> <Reviews><CPPreviewdate/></Reviews> <Reviews><CPPreviewdate></CPPreviewdate></Reviews>

SLornieCYC commented 1 year ago

Another thing to consider is that although we know the DFE xml generator creates those self-closing <Reviews/> tags which seem to not trigger errors in the portal (only my word for that so far!), not all LAs produce their CIN xml in the same way. Some use xml files generated directly by their social care CMS which may behave differently.