cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
46 stars 12 forks source link

Match Datasets not working as expected #488

Open gerrycampion opened 1 year ago

gerrycampion commented 1 year ago

Links to related JIRA Tickets

Rule Information

Describe the bug Refer to CG0213 in the rule editor and the negative test data.

Match Datasets is being used here:

Match Datasets:
  - Keys:
      - VISITNUM
    Name: TV

There seems to be an issue with match datasets. It appears that the dataset_preprocessor.preprocess is changing TV's (rightside dataset) VISITNUM column to TV.VISITNUM before performing the merge in data_processor.merge_sdtm_datasets. This causes the merge to fail, because the right_on is VISITNUM, which no longer exists.

Error returned from Rule Engine

{
"error":"Column not found in data"
"message":"VISITNUM"
}

Expected behavior Column should be found in the data and rule should be able to execute

CG0213.zip

nhaydel commented 1 year ago

Why wouldn't the rule author use the distinct operation here instead of match datasets? This seems like the exact issue the distinct operation was made to solve. Again I am just curious because I don't think this should block the rule

nhaydel commented 1 year ago

Also match datasets has been used extensively without issue, is there some test data and test rule that you used to verify the issue? If so can it be attached to the ticket

gerrycampion commented 1 year ago

Yes, distinct is working instead. Didn't think of it. I've found another bug with distinct that I will add another issue for. I updated the title of this ticket to not be a blocker. I've attached the test data to this ticket.