Open ASL-rmarshall opened 1 month ago
I spoke with Gerry today re: this ticket and how engine handles CT currently. To integrate this and make it fairly seemless across different standards--it will require a bit more work than initially intended. I will be creating a few tickets for this to split it up into smaller PBIs with the goal of creating a function that can handle CT checks for both DDF and SDTM
JSON containing the full request:
Links to related JIRA Tickets
Rule Information
Describe the bug I was looking for an existing operator that I could use to check if codes/decodes used in USDM data are valid when compared to a codelist included an a CT package extracted from the Library. Looking through the Rules Editor Reference Guide, I found the
does_not_use_valid_codelist_terms
operator, which looked like it might provide some of the required functionality. Although this operator is listed in the Define XML section of the Reference Guide (implying that it might only work for Define XML rule types, which wouldn't be appropriate for USDM because USDM doesn't use Define XML), I found lots of rules in the Rules Editor where this operator is used with a Rule Type of "Record Data" (e.g., CG0001, CG0555-CG0560, AD0039, SEND40, FDAB017). I therefore tried to use the operator, but it failed - none of the invalid values included in the negative test data were reported, and there was no other message indicating what this issue was.By looking at the engine code and doing some debugging, I found that there appear to be several issues:
uses_valid_codelist_terms
anddoes_not_use_valid_codelist_terms
operators may be used and how the operator parameters should be populated.valid_terms
method indataframe_operators.py
relies onDataframeType.codelist_term_maps
which is only populated when Rule Type is "Define Item Metadata Check". These operators will therefore never work for rules with a Rule Type of "Record Data".The wording of the operator names suggests that :
name
should be the name of a variable.value
should be a reference to a codelist.However, when the rule operator parameters are passed into the
valid_terms
method by theuses_valid_codelist_terms
operator:name
parameter ("target
") is used to identify dataset column(s) and is passed in ascodelist
value
parameter ("comparator
") is used to identify dataset column(s) passed in asterms_list
If
name
andvalue
are specified, respectively, as a variable/column name and a codelist name/identifier in a "Record Data" rule, the call tovalid_terms
crashes (without reporting an error) because there is no column in the dataset which has the same name as the specified codelist.Looking at the
test_does_not_use_valid_codelist_terms
method intest_codelist_checks.py
, it appears that the expectation might be that thename
andvalue
attributes used with these operators should be derived using operations that extract information from Define XML. If this is the case, the Reference Guide should be updated to indicate which operations should be used (with examples).allowed_terms
attribute for each codelist contains terms that are not valid - it contains submission values (which are valid), preferred terms (which are not valid as data values), and synonyms (which are also not valid as data values).Error returned from Rule Engine (None reported)
Expected behavior It looks like the
uses_valid_codelist_terms
anddoes_not_use_valid_codelist_terms
operators are not designed to work in the way that they're being used in lots of checks. The Reference Guide should therefore be updated so that it's clear when and how these operators should and should not be used.There is a need for operators that can be used to check for use (or non-use) of valid CT values in a variable independently of Define XML (i.e., by referencing cached Library CT directly). These operators should be able to check both for valid "submission values" and for valid C-codes for the terms in the specified codelist.
The cached CT files need to be updated so that:
allowed_terms
only contains allowed terms.