data-to-insight / csc-validator-be-cin

1 stars 1 forks source link

Bug: Rule 8670Q - Not returning errors #377

Closed SLornieCYC closed 1 year ago

SLornieCYC commented 1 year ago

Describe the bug Rule is returning no errors in CIN Validator whereas I get 74 errors in DFE portal from the same data.

Proposed fix (Optional) Not sure what exactly is causing the issue however from looking at the python I have one initial query:

  1. Does this actually do what it's suppposed to? The return from england_working_days is pd.offsets.CustomBusinessDay(n=num_days, calendar=holiday_calendar) but the function is being subtracted from a date here rather than being applied to a date - so what is the function actually returning?

https://github.com/data-to-insight/CIN-validator/blob/dc4559d2362e0efacfe47ebb9b4b3d538c7d0094/cin_validator/rules/cin2022_23/rule_8670Q.py#L50-L51

WillLP-code commented 1 year ago

@SLornieCYC So, england_working_days(45) returns an object that should include the information about how to subtract 45 business days from another date, using a set of holidays curated by, I think, the DOJ. It does actually do this, except, it sort of doesn't. Using an online business days calculator I get 30th Jan as the date 45 working days before the collection end, but the cin tool calculates it to be the 27th Jan, so something is clearly going wrong. If it turns out that the dates you have which fail are in that range, that'd be great news as it'll help explain what's going on.

WillLP-code commented 1 year ago

Right, it depends if we include the date of collection end (we should!) I believe the current calculation excludes it, putting the date at the 27th, where if we include it, it becomes the 30th.

This can be fixed by changing the calculation to do 44 working days. I'll make a PR with that, request Tambe, and see if, instead of making this weird fix, we can get the fn. in utils to be inclusive instead.

SLornieCYC commented 1 year ago

@willLPD2I Actually the assessments not being flagged in my data all have start dates between November 2022 and 24 January 2023 so seemingly entirely unrelated to the collection end date fix you've identified above!

This is an example of one of the assessments as produced by the DFE XML Generator which is not flagged:

<Assessments>
<AssessmentActualStartDate>2023-01-24</AssessmentActualStartDate>
</Assessments>
WillLP-code commented 1 year ago

@SLornieCYC Bizarre, I'll keep looking!

WillLP-code commented 1 year ago
<Assessments>
<AssessmentActualStartDate>2023-01-24</AssessmentActualStartDate>
</Assessments>

Passes the pytest when added to the sample df, however, when I make a fake child xml and run it on that, it doesn't pass, which says that something is going wrong perhaps outside of the rule, maybe in calculating the reference period. It actually passes whenn using run-all regardless of what the reference date is set to.

WillLP-code commented 1 year ago

If you pop a print statement in the assessments ingress then run-all on fake_CIN_data.xml, it prints empty dataframes, it looks like the ingress for assessments isn't working at all. If you run just 8670Q, it can't find the assessments table for tables affected either.

WillLP-code commented 1 year ago

I have now determined that the assessments block doesn't build if there are no factors identified at assessment, which will always be the case in instances where an assessment hasn't been completed.

WillLP-code commented 1 year ago

Okay, now replicated, the bug is produced because, in the ingress, the blocks are built for factor in assessment_factor, so if a block doesn't have an assessment factor... it doesn't build. This can be fixed by extending the if assessment_factors: block to include an else, which builds when they don't eixst.

SLornieCYC commented 1 year ago

Haha brilliant! Glad you managed to track it down to something that makes sense.

SLornieCYC commented 1 year ago

Looks like this one is mostly fixed - queries are now flagging in CIN validator when they weren't before. However there are a few included where start date is 27 Jan so the accompanying change to custombusinessday doesn't seem to have worked. These aren't flagged in the DFE portal export so I think Will's theory about collection end date is right at least.