datafusion-contrib / datafusion-objectstore-s3

S3 as an ObjectStore for DataFusion
Apache License 2.0
59 stars 13 forks source link

To-do list for publishing 0.1 #26

Closed matthewmturner closed 2 years ago

matthewmturner commented 2 years ago
matthewmturner commented 2 years ago

@seddonm1 FYI in case if you have anything else.

matthewmturner commented 2 years ago

As an update, i will be working on adding docs and publishing to crates over the coming days

@seddonm1 @houqp let me know if you think anything else should be done before 0.1 release.

matthewmturner commented 2 years ago

@seddonm1 are you ok if i add you as an owner on crates.io?

houqp commented 2 years ago

list looks good to me :+1: looking forward to the release :D

seddonm1 commented 2 years ago

@matthewmturner no problem. I think @alamb is lining up datafusion 7.0.0 which would be a good point to publish

alamb commented 2 years ago

7.0.0 release is tracked with https://github.com/apache/arrow-datafusion/issues/1587

I expect it to take another week or so for your planning purposes.

matthewmturner commented 2 years ago

@seddonm1 @houqp

I merged #38 into master. Going to review everything over the coming days and give some time in case either of you have feedback before publishing the crate.

I'm targeting to release around Friday if no issues.

matthewmturner commented 2 years ago

Scratch the above where i target Friday. I will release once DataFusion 7.0 has been released so i can pin to that version and make some small updates here - such as using ListingTableConfig to simplify creating tables.

seddonm1 commented 2 years ago

This one needs to be fixed before publish: https://github.com/apache/arrow-datafusion/pull/1779

seddonm1 commented 2 years ago

The change didnt make Datafusion release 7.0 but we will need to change the API/Docs to comply with it before release. I can do it in the next few days.

matthewmturner commented 2 years ago

@seddonm1 can you just clarify how that will work given we are expecting to have dependency to 7.0 which doesnt have it?

seddonm1 commented 2 years ago

@matthewmturner In my head I think there are two things at play:

  1. we need to identify which object store to use. in future this will be get_by_uri and return the S3ObjectStore plus the URI (full uri)
  2. The uri then needs to be passed to the Object Store implementation to be read (https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/datasource/object_store/local.rs#L41). It is up to the ObjectStore to then request the correct file.
  3. The current implementation relies on the user to do the splitting (https://github.com/datafusion-contrib/datafusion-objectstore-s3/blob/main/src/lib.rs#L193) which does not fit with the future direction of ObjectStore.

It makes sense in my head but without a lot of time to think about it yet.

But you are possibly correct we cannot do anything until next DataFusion release.

matthewmturner commented 2 years ago

I created a first release candidate tag so i could familiarize myself with the process. Will leave for a day and target actual release tomorrow and publishing to crates if no issues raised.

matthewmturner commented 2 years ago

Closed by https://github.com/datafusion-contrib/datafusion-objectstore-s3/commit/56f439509221d39e5e1de4b91bdd07dd2d50f068