Open mhungria opened 4 months ago
See test data and Define-XML document under https://cdisc.sharepoint.com/:f:/r/sites/CORERules/Shared%20Documents/unitTesting/SDTMIG/CG0320/Rule_cg0320x_test?csf=1&web=1&e=5ZwP4m
See Results in the attached Rule.zip file: Rule.zip
we discussed this ticket on 7/9/24 rule meeting and it was determined we should reopen it. Marcelina showed a case where the dataset's domain did not match the define.XML domain. The define.xml matched all conditions for the rule to produce a negative result but the dataset was skipped when a test was run. Currently engine is determining when to validate based on the dataset's domain: https://github.com/cdisc-org/cdisc-rules-engine/blob/7c3859875c77349390726391a067b07aa7590e85/scripts/run_validation.py#L79 The define should be the source of truth and should be referenced over the dataset domain in these types of validations.
currently
def build(self):
"""
Returns a long dataset where each value in each row of the original dataset is
a row in the new dataset.
The define xml variable metadata corresponding to each row's variable is
attached to each row.
Columns available in the dataset are:
dataset_size - File size
dataset_location - Path to file
dataset_name - Name of the dataset
dataset_label - Label for the dataset
Columns from Define XML:
define_dataset_name - dataset name from define_xml
define_dataset_label - dataset label from define
define_dataset_location - dataset location from define
define_dataset_class - dataset class
define_dataset_structure - dataset structure
define_dataset_is_non_standard - whether a dataset is a standard ```
There is a need for dataset_domain, define_dataset_domain for each dataset for these checks which should be included in these checks. This should be added to this dataset builder. The dataset builder also needs a build_split_dataset() functionality constructed for it
I would adjust the statement “The define should be the source of truth and should be referenced over the dataset domain in these types of validations.” to “If a Define-XML is indicated and there is a conflict between Define-XML dataset metadata and the actual data’s dataset metadata, the Define-XML metadata should be referenced over the dataset metadata in these types of validations.,”
so the acceptance criteria for this ticket is:
In today's Daily Scrum we decided to table this PBI for the moment (moved back into our Sprint backlog) until we can decide if @SFJohnson24's 3rd bullet point above should be split off into its own PBI. @nickdedonder will gather the important contributors together so he can make that final call.
define_* variables retrieved incorrectly when dataset_name not found in Define-XML document. It appears that the first available ItemGroupDef that satisfies the request conditions (if any) is used to return the metadata requested.
Also, dataset_name is changed to "None" in those cases. I assume it is a BUG and the the define_dataset_name is the one that should be returned as "None".
See test for CDISC.SDTMIG.CG0320