cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
46 stars 12 forks source link

Operator needed to match dataset to itself #620

Closed eljanssens closed 4 months ago

eljanssens commented 7 months ago

For rules like CG0272 to CG0279: two TSVALs of different TSPARMCDs need to be checked. This means that TS needs to be matched with TS. For this, no operation exists yet.

Condition: TSPARMCD = 'TDIGRP' and record exists where TSPARMCD = 'HLTSUBJI' and TSVAL = 'Y' Rule: TSVAL = null or 'HEALTHY SUBJECTS'

chowsanthony commented 6 months ago

@dostiep A suggestion to tackle this: Do a row count where (TSPARMCD = HLTSUBJID and the TSVAL for this TSPARMCD = ‘Y’). Let's call this HLTSUBJI_count. Do another row count where (TSPARMCD = TDIGRP and the TSVAL for this is either TSPARMCD = ‘HEALTHY SUBJECTS’ or empty). Let's call this TDIGRP_count. Returns false if HLTSUBJI_count = 1, but TDIGRP_count = 0.

gerrycampion commented 6 months ago

@dostiep, I took @chowsanthony's idea and modified it a bit and put it in rule form:

Check:
  all:
    - name: TSPARMCD
      operator: equal_to
      value: TDIGRP
    - name: $HLTSUBJI_Y
      operator: greater_than_or_equal_to
      value: 1
    - all:
        - name: TSVAL
          operator: not_equal_to
          value: 'HEALTHY SUBJECTS'
          value_is_literal: true
        - name: TSVAL
          operator: non_empty
Operations:
  - filter:
      TSPARMCD: HLTSUBJI
      TSVAL: Y
    id: $HLTSUBJI_Y
    operator: record_count

I think this is an easy way to understand the rule. Note that we already have a record_count operation, so we would just need to overload it with an optional filter parameter that takes a dict of variable names and values. Does this seem reasonable? It should work for the "record exists" rules 272-279

SFJohnson24 commented 5 months ago

PR: https://github.com/cdisc-org/cdisc-rules-engine/pull/640

RamilCDISC commented 5 months ago

The QA validation for the issue is complete. The updated code fulfills the acceptance criteria. The engine successfully checks the rule conditions and also handles the edge cases. The following test cases were covered

  1. Check with positive data where both conditions fulfill
  2. check with negative data data where second condition fails
  3. check with negative data where first condition fails
  4. check wih data where no values for TSPARMCD is HLTSUBJID but all other coditions meet
  5. check with data where no values for TSPARMMCH is not TDIGRP
  6. check with data where TSVAL vakue for TSPARMCD is not empty or Healthy subjects

All the above cases passed. The engine successfully handles all test cases and validated the rule successfully. Following is a screenshot from few of the runs for the tests:

620