dask-contrib / dask-deltatable

A Delta Lake reader for Dask
BSD 3-Clause "New" or "Revised" License
46 stars 15 forks source link

Finalize API for writing Delta Tables #42

Closed j-bennet closed 1 year ago

j-bennet commented 1 year ago

The initial API for writing Delta Lake is a little bit clunky for the user.

When reading, users have to do something like this:

from dask_deltatalbe import read_delta_table
ddf = read_delta_table("path_to_table")

To write, they need this:

from dask_deltatable.write import to_deltalake
out = to_deltalake("path_to_table", ddf)
out.compute()

TODO:

fjetter commented 1 year ago

read_foo / to_foo is the standard terminology in dask. I believe this is true for all IO APIs we're offering, see https://docs.dask.org/en/stable/dataframe-api.html#create-dataframes and https://docs.dask.org/en/stable/dataframe-api.html#store-dataframes

I suggest read_deltalake and to_deltalake

to_deltalake should be exposed on top level, same as read_delta_table

+ 1

user shouldn't need to call compute.

We typically offer a compute kwarg to control this behavior. I'm fine adding this to to_deltalake as well.