This is cleaning up the read_dataset_as_ddf code path to use a more modern API for graph generation with a couple of performance and QOL changes
Using the delayed construction API is rather slow for larger datasets since we have to iterate that graph multiple times. This API is avoiding this / only doing it once.
Using the DataFrameIOFunction protocol allows us to automatically use column projection when the user is just using __getitem__ on a dataframe. Note: this works for some operations but does not work for arbitrary expression chains. Still nice if you are doing ad-hoc analysis stuff.
This is cleaning up the
read_dataset_as_ddf
code path to use a more modern API for graph generation with a couple of performance and QOL changesDataFrameIOFunction
protocol allows us to automatically use column projection when the user is just using__getitem__
on a dataframe. Note: this works for some operations but does not work for arbitrary expression chains. Still nice if you are doing ad-hoc analysis stuff.