Closed j-bennet closed 1 year ago
read_foo
/ to_foo
is the standard terminology in dask. I believe this is true for all IO APIs we're offering, see https://docs.dask.org/en/stable/dataframe-api.html#create-dataframes and https://docs.dask.org/en/stable/dataframe-api.html#store-dataframes
I suggest read_deltalake
and to_deltalake
to_deltalake should be exposed on top level, same as read_delta_table
+ 1
user shouldn't need to call compute.
We typically offer a compute kwarg to control this behavior. I'm fine adding this to to_deltalake
as well.
The initial API for writing Delta Lake is a little bit clunky for the user.
When reading, users have to do something like this:
To write, they need this:
TODO:
read_delta_table
vsto_deltalake
. Either of the following combos would be more consistent:read_delta_table/write_delta_table
read_deltalake/write_deltalake
read_delta_table/to_delta_table
read_deltalake/to_deltalake
to_deltalake
should be exposed on top level, same asread_delta_table
compute
as an extra step, addcompute: bool
kwarg instead