The PR adds the changes for fulfilling the issue 537 AC. The changes adds a function to determine either to use dask or pandas.After research i noticed that pandas dataframe fail or become slow when datasets are larger or equivalent to available RAM. Instead of using a fixed threshold, i have set the code to check if the dataset size is larger than 70% available memory then use dask else pandas.
A config variable can also be set to supersede the normal criteria for determining which dataset to use. If config is set it returns the name of dataset set in the config
The PR adds the changes for fulfilling the issue 537 AC. The changes adds a function to determine either to use dask or pandas.After research i noticed that pandas dataframe fail or become slow when datasets are larger or equivalent to available RAM. Instead of using a fixed threshold, i have set the code to check if the dataset size is larger than 70% available memory then use dask else pandas.
A config variable can also be set to supersede the normal criteria for determining which dataset to use. If config is set it returns the name of dataset set in the config