cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
46 stars 12 forks source link

Rule blocked: CORERULES-9562 - invalid_duration operator not accurate #828

Open ASL-rmarshall opened 2 weeks ago

ASL-rmarshall commented 2 weeks ago

File with the JSON containing the full request: Request.txt

Links to related JIRA Tickets

Rule Information

Describe the bug The invalid_duration operator does not accurately report invalid ISO 8601 durations - it does not report some invalid durations as invalid and it reports other valid durations as invalid. Ideally, this operator should be based on the full ISO 8601 specification, though it's unclear which version should be used - e.g., 2004 or 2019. However, at a minimum, I think this operator should align with the ISO 8601 duration specifications and guidance in the SDTMIG.

Examples of:

Error returned from Rule Engine

{
    "TIMING": [
        {
            "executionStatus": "success",
            "dataset": "Timing.xpt",
            "domain": "TIMING",
            "variables": [
                "id",
                "name",
                "parent_entity",
                "parent_id",
                "parent_rel",
                "value",
                "valueLabel"
            ],
            "message": "The value attribute of the timing is not in ISO 8601 format.",
            "errors": [
                {
                    "value": {
                        "parent_entity": "ScheduleTimeline",
                        "value": "-P2D",
                        "name": "TIM7",
                        "id": "Timing_7",
                        "parent_rel": "timings",
                        "parent_id": "ScheduleTimeline_4",
                        "valueLabel": "2 days"
                    },
                    "row": 2
                },
                {
                    "value": {
                        "parent_entity": "ScheduleTimeline",
                        "value": "P4.5W",
                        "name": "TIM8",
                        "id": "Timing_8",
                        "parent_rel": "timings",
                        "parent_id": "ScheduleTimeline_4",
                        "valueLabel": "4.5 Weeks"
                    },
                    "row": 3
                }
            ]
        }
    ]
}

Expected behavior For this example, "-P2D" and "P4.5W" should not be reported as invalid durations (assuming negative durations are allows - see above), and "P2W1D" should be reported as invalid.

chowsanthony commented 1 week ago

@SFJohnson24 I thought to just jot a quick note to share some subtle difference about duration vs. elapsed time: In SDTM, the --DUR variable represents duration and is expressed using ISO 8601's P notation. Therefore, negative durations are not permitted. In contrast, the --ELTM variable, which represents elapsed time, can include negative values, such as -P15M.

ASL-rmarshall commented 1 week ago

@chowsanthony I'm not sure that the SDTM/SDTMIG specifies either the version of ISO 8601 that applies or that there's a difference between representations in --DUR vs --ELTM (both are defined as having "ISO 8601 duration" format). While ISO 8601:2004 specifies duration as "non-negative quantity attributed to a time interval...", under ISO 8601:2019-2 (Date and time - Representations for information interchange - Part 2: Extensions), negative durations are defined as "duration in the reverse direction to the proceeding time scale". See section 3.1.1.7 here.

I don't have a copy of the full specification (not wanting to part company with CHF 194), but I have seen references that suggest negative durations can be represented with either a minus sign prefix or embedded minus sign(s).

For this operator, I guess we need to define (and document) what constitutes an "invalid" duration.

SFJohnson24 commented 6 days ago

@ASL-rmarshall what exactly will DDF need in terms of the definition duration? Will negative durations be permissible for DDF?

ASL-rmarshall commented 5 days ago

@SFJohnson24 Negative durations are not allowed for the USDM attributes referenced in the 3 rules linked above. In fact, I added a specific "value contains any hyphens" Check clause to the rule specifications (in addition to "value is an invalid duration") to report negatives even if they're not flagged as invalid durations.

If the invalid_duration operator will report negative durations as invalid, then this implies that it's based on the ISO 8601:2004 specification - which is fine, as long as anyone who wants to use the operator knows what it will report as invalid. However, I think this might limit the operator's applicability for SDTM rules. There will definitely be SDTM rules (e.g., for --EVLINT variables) to check for invalid (ISO 8601) durations that are allowed to be negative.

I sort of prefer using the ISO 8601:2004 specification because it's more straightforward (it has no extensions so less variability), but negative durations(/intervals) are definitely allowed in some SDTM variables. It might be worth considering adding a parameter like allow_negatives to the operator that determines whether a leading hyphen is OK or not. I don't think we want to allow negative durations expressed using embedded hyphens/minus signs (e.g., P-1D) even if they are allowed by ISO 8601:2019-2.