cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
46 stars 12 forks source link

Refactor: Optimized rule applies to class functionality for large datasets #689

Closed SFJohnson24 closed 4 months ago

SFJohnson24 commented 5 months ago
SFJohnson24 commented 5 months ago

validate with update.xlsx validate without update.xlsx I performed a validate with both the current changes and pre-changes and the reports are identical

SFJohnson24 commented 4 months ago

Why can't we just add something like original_path as another dataset property here?

https://github.com/cdisc-org/cdisc-rules-engine/blob/d364ac102b8365dffde5c1551aedacdbd66cc89f/scripts/run_validation.py#L147-L150

this worked with a bit of decency injection to get it datasets to the read_metadata call. I refactored to reflect this change and ran test validations which appear to be working.

SFJohnson24 commented 4 months ago

Still failing. New error. Here:

https://github.com/cdisc-org/cdisc-rules-engine/blob/90ea95d26caf2fcfee58ea9b6142e62b9d1cb106/cdisc_rules_engine/services/data_services/local_data_service.py#L187

Instead of file_path, I think it needs to be file_metadata["path"]. file_path is still pointing to the parquet path without metadata. file_metadata["path"] has the xpt path

I looked to change this but it was extracting metadata properly without the change for some reason (I am using JSON) despite the file_path. pointing to the parquet. First image is original, second is with file_metadata["path"] Regardless, I changed it. image Screenshot 2024-05-17 104611