Updates for Issue 537 - Githubissues

RamilCDISC commented 1 year ago

The PR adds the changes for fulfilling the issue 537 AC. The changes adds a function to determine either to use dask or pandas.After research i noticed that pandas dataframe fail or become slow when datasets are larger or equivalent to available RAM. Instead of using a fixed threshold, i have set the code to check if the dataset size is larger than 70% available memory then use dask else pandas.

A config variable can also be set to supersede the normal criteria for determining which dataset to use. If config is set it returns the name of dataset set in the config

dostiep commented 1 year ago

Can this be added to issue #550 I'm currently working on? I'm using this new feature to create Dataset instances.

drewcdisc commented 1 year ago

Closing this as it is being addressed in #550

cdisc-org / cdisc-rules-engine

Updates for Issue 537 #548