cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
45 stars 12 forks source link

Reading entire dataset to determine class #676

Open gerrycampion opened 3 months ago

gerrycampion commented 3 months ago

Here: https://github.com/cdisc-org/cdisc-rules-engine/blob/b4d8e91861dc8a4d2c08995d46939ddb721374c2/cdisc_rules_engine/utilities/rule_processor.py#L183 and here: https://github.com/cdisc-org/cdisc-rules-engine/blob/b4d8e91861dc8a4d2c08995d46939ddb721374c2/cdisc_rules_engine/utilities/rule_processor.py#L193

We are reading in the entire dataset to determine the class of the dataset. We only need to read the metadata. This takes an unnecessary amount of time, especially in the case of large datasets.

get_dataset_class should only require the dataset metadata instead of the full dataset.