apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.3k stars 1.19k forks source link

Release of DataFusion: 7.0.0 #1587

Closed alamb closed 2 years ago

alamb commented 2 years ago

I wonder if it is time to release a new version of datafusion to crates.io?

It would be great to crowdsource:

  1. Update readme / changelog
  2. Update version
  3. (maybe) a blog post?

I am happy to handle creating a release candidate / doing the official voting process.

xudong963 commented 2 years ago

Yeah, I think we can wait for arrow2 related tickets merged into master? BTW, I can help write a blog!

alamb commented 2 years ago

arrow2 may be a good driver.

I don't have a good sense of how many projects use datafusion from crates.io (aka what has been released) vs how many use it via a github sha . IOx (my project) uses the sha but I realized that maybe others are waiting for an actual release

alamb commented 2 years ago

There is all sorts of good stuff in DataFusion since we last made a release. Since arrow (C++) just released a 7.0.0 I was thinking to do the same with DataFusion (as @pauldix says the success of a project is predicated on 1. Making sweet sweet software and 2. telling people about it). We have done 1, and now we need to do a bit more of 2. ✍️

matthewmturner commented 2 years ago

Perhaps we can start crowdsourcing points for a post / blog on a google doc. I've made one here (sry havent had chance to add anything yet but trying to help as i can):

https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing

alamb commented 2 years ago

Thank you @matthewmturner

houqp commented 2 years ago

I won't have the time to drive the release this time, but happy to help fixing any issue with the existing release automation and guide anyone through the process.

alamb commented 2 years ago

I can drive the release

xudong963 commented 2 years ago

I will finish this issue https://github.com/apache/arrow-datafusion/issues/1400 before release to give our users a clearer readme

matthewmturner commented 2 years ago

@alamb FYI i went through and made some updates to the google doc.

Next up I will focus on performance improvements / new features. If you have anything in particular in mind you would like added could just mention here or on doc and ill add it / look up the relevant issue / PR to link?

alamb commented 2 years ago

Thanks @matthewmturner -- I'll try and give the doc another pass through later today

matthewmturner commented 2 years ago

I went through and made more updates and added some git stats.

@alamb let me know if anything in particular I can do to help alleviate the burden on your side. I'm happy to provide more assistance on releases in general (this one and future).

alamb commented 2 years ago

Thanks you very much @matthewmturner

Things that I think we need to do prior to release:

Making the PR to update the version would be sweet 👍

I think it is probably best if I take an initial swag at running the changelog generator thing as it requires the ability to mess with tickets titles / tags.

alamb commented 2 years ago

Here is a proposed changelog. https://github.com/apache/arrow-datafusion/pull/1807#

I guess it is probably time to cut a release branch. Does anyone know if we are waiting for anything else to be merged? @xudong963 @matthewmturner @Dandandan @houqp @Dandandan @Jimexist @andygrove ?

matthewmturner commented 2 years ago

@HaoYang670 did you get a chance to work on #1741?

@alamb its probably too late to have any impact for this release, but for my info, can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release? I figure any docs.rs update will have to be linked to a release.

Otherwise, okay for me.

alamb commented 2 years ago

can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release?

No, we can update https://arrow.apache.org/datafusion/ (the hosted version of https://github.com/apache/arrow-datafusion/tree/master/docs) any time we want 👍

matthewmturner commented 2 years ago

@alamb great! thx

xudong963 commented 2 years ago

I guess it is probably time to cut a release branch. Does anyone know if we are waiting for anything else to be merged?

I take a quick look at our unmerged tickets, I think no. Thanks for your nice work! @alamb

HaoYang670 commented 2 years ago

I am working on https://github.com/apache/arrow-datafusion/issues/1741 these days. And I will file a PR today.

On Fri, 11 Feb 2022 at 06:52, Matthew Turner @.***> wrote:

@HaoYang670 https://github.com/HaoYang670 did you get a chance to work on #1741 https://github.com/apache/arrow-datafusion/issues/1741?

@alamb https://github.com/alamb its probably too late to have any impact for this release, but for my info, can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release? I figure any docs.rs update will have to be linked to a release.

Otherwise, okay for me.

— Reply to this email directly, view it on GitHub https://github.com/apache/arrow-datafusion/issues/1587#issuecomment-1035613604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AODUWFWDLAYIZVGD37JIUD3U2Q6R5ANCNFSM5MCWOZLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

alamb commented 2 years ago

I am going to do a dry run today of publishing datafusion to crates.io (will use an 0.1.xx version to test)

alamb commented 2 years ago

I tested publishing 7.0.0-alpha to crates.io using https://github.com/alamb/arrow-datafusion/tree/alamb/test_publish

then going into each crate like this:

cd datafusion-common && cargo publish

One thing I noticed is that datafusion-cli depends on ballista, and so without a ballista release we can't also do a datafusion-cli release 🤔 But maybe that is ok and we can do a datafusion-cli release later (or maybe even move datafusion cli into the contrib repo, or perhaps make the ballista dependency optional)

matthewmturner commented 2 years ago

I had thought about moving datafusion-cli to datafusion-contrib as well. This does seem generally aligned with moving things out of the datafusion repo that arent actually datafusion (i.e. datafusion-python). It does also seem reasonable to make ballista optional.

jimexist commented 2 years ago

Making ballista dependency optional seems okay. Having a cli in repo helps a lot with debugging in my opinion

matthewmturner commented 2 years ago

@Jimexist That's a really good point - I have found it helpful as well. But couldn't we do something like the below in cargo.toml from the datafusion-cli repo to achieve a similar experience? Not quite as convenient but I think it's close.

[dependencies]
datafusion = { path = "../path/to/datafusion" }
alamb commented 2 years ago

Having a cli in repo helps a lot with debugging in my opinion

This is an excellent point @Jimexist -- I use the cli extensively during debugging.

But couldn't we do something like the below in cargo.toml from the datafusion-cli repo to achieve a similar experience? Not quite as convenient but I think it's close.

I think we could @matthewmturner but I think that will effectively require people to have checked out the datafusion-cli repo any time they want to build datafusion; So if it is required to build datafusion, the benefits of putting it in a separate repo seem pretty small 🤔

alamb commented 2 years ago

Filed https://github.com/apache/arrow-datafusion/issues/1814 to see if we can solict some more help for the doc site

alamb commented 2 years ago

https://github.com/apache/arrow-datafusion/pull/1816 <-- for optional ballista feature in datafusion-cli

alamb commented 2 years ago

🤔 I don't think https://github.com/apache/arrow-datafusion/pull/1816 is sufficient to publish datafusion-cli to crates.io -- cargo still tries to resolve the dependencies of ballista (even though it is an optional dependency)

The only way I could get it to publish was to comment out the ballista dependency all together 8750db3bea 🤔

alamb commented 2 years ago

Update here: I would like to wait for the arrow 9.0.0 release to be published (later today or tomorrow) and then update datafusion to use it: https://github.com/apache/arrow-datafusion/pull/1775

Then I'll try and make a release candidate tomorrow or Monday

Its going to be great!

alamb commented 2 years ago

I am actively working to create a release candidate for datafusion 7.0.0

alamb commented 2 years ago

@matthewmturner would you be willing to start a PR for a blog post: https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing ?

The DataFusion 6.0.0 announcement is here for reference: https://github.com/apache/arrow-site/pull/160

matthewmturner commented 2 years ago

@matthewmturner would you be willing to start a PR for a blog post: https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing ?

The DataFusion 6.0.0 announcement is here for reference: apache/arrow-site#160

Sure - will do now!

alamb commented 2 years ago

Official mailing list post with all the details is here: https://lists.apache.org/thread/t8381y8x1t452dvqr3y7h85q4dncvwrx

xudong963 commented 2 years ago

Thanks, @alamb @matthewmturner ❤️ I will help review

alamb commented 2 years ago

The release was approved and published 🎉

Mailing list thread is here: https://lists.apache.org/thread/hcpcf3shlt0l3fm2k313tq1tvrczlowf

The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-7.0.0

I have also published it to crates.io here: https://crates.io/crates/datafusion/7.0.0