The integrity of the SF-SAC is determined by the quality of its data checks. We applied our current checks to the migration so as to up-curate the historical data (in as much as was possible). We can, and must, continue to improve the quality of data collected in the FAC.
### Tasks
- [x] Re-run all checks at time of "lock for submission"
- [x] Re-run all checks at time of submission
- [ ] Check for consistency of findings between `general` and `findings` forms
- [ ] https://github.com/GSA-TTS/FAC/issues/4444
- [ ] Verify correctness of type requirement column in all cases
Consistency of findings
The general table has indicators of whether there are only unmodified opinions. The findings table can, however, report modified opinions on compliance. This is inconsistent.
These should not be "at odds" with each-other. An audit with inconsistent reporting should be blocked from submission.
If a modified opinion is reported in the general table, then there must be corresponding entries in the findings table; it cannot be that there are no corresponding findings.
Existence of prior findings reference number
When a prior findings reference number is used, we should check to see if it can be found. If not, it should be flagged for the auditor, and we should think about how to further annotate the audit in response to the prior-year data being incomplete/incorrect.
This may be a curation point/question: should we amend or otherwise be able to "correct" prior audits if reference numbers do not exist/were omitted, so that the current audit can be "more correct?"
Remove duplicate rows
We had a race hazard that allows some dissemination tables to contain duplicate rows. There are no duplicates in the intake data, but it does exist in the export. These need to be identified and removed.
Verify correctness of type_requirement
We enforce valid values on intake, but if one does not exist, we should add a regex validation in our Python and/or JSonnet layer. (This may already exist, in which case we should confirm that type_requirement is correct at intake-time.)
The tasks Re-run all checks at time of "lock for submission" and Re-run all checks at time of submission have been completed here: https://github.com/GSA-TTS/FAC/pull/4203
Story
The integrity of the SF-SAC is determined by the quality of its data checks. We applied our current checks to the migration so as to up-curate the historical data (in as much as was possible). We can, and must, continue to improve the quality of data collected in the FAC.
Consistency of findings
The
general
table has indicators of whether there are only unmodified opinions. Thefindings
table can, however, report modified opinions on compliance. This is inconsistent.general
table, then there must be corresponding entries in thefindings
table; it cannot be that there are no corresponding findings.Existence of prior findings reference number
When a prior findings reference number is used, we should check to see if it can be found. If not, it should be flagged for the auditor, and we should think about how to further annotate the audit in response to the prior-year data being incomplete/incorrect.
This may be a curation point/question: should we amend or otherwise be able to "correct" prior audits if reference numbers do not exist/were omitted, so that the current audit can be "more correct?"
Remove duplicate rows
We had a race hazard that allows some dissemination tables to contain duplicate rows. There are no duplicates in the intake data, but it does exist in the export. These need to be identified and removed.
Verify correctness of
type_requirement
We enforce valid values on intake, but if one does not exist, we should add a regex validation in our Python and/or JSonnet layer. (This may already exist, in which case we should confirm that
type_requirement
is correct at intake-time.)