Closed alamb closed 2 years ago
Yeah, I think we can wait for arrow2 related tickets merged into master? BTW, I can help write a blog!
arrow2 may be a good driver.
I don't have a good sense of how many projects use datafusion from crates.io
(aka what has been released) vs how many use it via a github
sha . IOx (my project) uses the sha but I realized that maybe others are waiting for an actual release
There is all sorts of good stuff in DataFusion since we last made a release. Since arrow (C++) just released a 7.0.0 I was thinking to do the same with DataFusion (as @pauldix says the success of a project is predicated on 1. Making sweet sweet software and 2. telling people about it). We have done 1, and now we need to do a bit more of 2. ✍️
Perhaps we can start crowdsourcing points for a post / blog on a google doc. I've made one here (sry havent had chance to add anything yet but trying to help as i can):
https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing
Thank you @matthewmturner
I won't have the time to drive the release this time, but happy to help fixing any issue with the existing release automation and guide anyone through the process.
I can drive the release
I will finish this issue https://github.com/apache/arrow-datafusion/issues/1400 before release to give our users a clearer readme
@alamb FYI i went through and made some updates to the google doc.
Next up I will focus on performance improvements / new features. If you have anything in particular in mind you would like added could just mention here or on doc and ill add it / look up the relevant issue / PR to link?
Thanks @matthewmturner -- I'll try and give the doc another pass through later today
I went through and made more updates and added some git stats.
@alamb let me know if anything in particular I can do to help alleviate the burden on your side. I'm happy to provide more assistance on releases in general (this one and future).
Thanks you very much @matthewmturner
Things that I think we need to do prior to release:
Making the PR to update the version would be sweet 👍
I think it is probably best if I take an initial swag at running the changelog generator thing as it requires the ability to mess with tickets titles / tags.
Here is a proposed changelog. https://github.com/apache/arrow-datafusion/pull/1807#
I guess it is probably time to cut a release branch. Does anyone know if we are waiting for anything else to be merged? @xudong963 @matthewmturner @Dandandan @houqp @Dandandan @Jimexist @andygrove ?
@HaoYang670 did you get a chance to work on #1741?
@alamb its probably too late to have any impact for this release, but for my info, can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release? I figure any docs.rs update will have to be linked to a release.
Otherwise, okay for me.
can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release?
No, we can update https://arrow.apache.org/datafusion/ (the hosted version of https://github.com/apache/arrow-datafusion/tree/master/docs) any time we want 👍
@alamb great! thx
I guess it is probably time to cut a release branch. Does anyone know if we are waiting for anything else to be merged?
I take a quick look at our unmerged tickets, I think no. Thanks for your nice work! @alamb
I am working on https://github.com/apache/arrow-datafusion/issues/1741 these days. And I will file a PR today.
On Fri, 11 Feb 2022 at 06:52, Matthew Turner @.***> wrote:
@HaoYang670 https://github.com/HaoYang670 did you get a chance to work on #1741 https://github.com/apache/arrow-datafusion/issues/1741?
@alamb https://github.com/alamb its probably too late to have any impact for this release, but for my info, can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release? I figure any docs.rs update will have to be linked to a release.
Otherwise, okay for me.
— Reply to this email directly, view it on GitHub https://github.com/apache/arrow-datafusion/issues/1587#issuecomment-1035613604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AODUWFWDLAYIZVGD37JIUD3U2Q6R5ANCNFSM5MCWOZLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
I am going to do a dry run today of publishing datafusion to crates.io (will use an 0.1.xx version to test)
I tested publishing 7.0.0-alpha
to crates.io using https://github.com/alamb/arrow-datafusion/tree/alamb/test_publish
then going into each crate like this:
cd datafusion-common && cargo publish
One thing I noticed is that datafusion-cli
depends on ballista, and so without a ballista release we can't also do a datafusion-cli release 🤔 But maybe that is ok and we can do a datafusion-cli release later (or maybe even move datafusion cli into the contrib repo, or perhaps make the ballista dependency optional)
I had thought about moving datafusion-cli
to datafusion-contrib as well. This does seem generally aligned with moving things out of the datafusion
repo that arent actually datafusion
(i.e. datafusion-python
). It does also seem reasonable to make ballista
optional.
Making ballista dependency optional seems okay. Having a cli in repo helps a lot with debugging in my opinion
@Jimexist That's a really good point - I have found it helpful as well. But couldn't we do something like the below in cargo.toml
from the datafusion-cli
repo to achieve a similar experience? Not quite as convenient but I think it's close.
[dependencies]
datafusion = { path = "../path/to/datafusion" }
Having a cli in repo helps a lot with debugging in my opinion
This is an excellent point @Jimexist -- I use the cli extensively during debugging.
But couldn't we do something like the below in cargo.toml from the datafusion-cli repo to achieve a similar experience? Not quite as convenient but I think it's close.
I think we could @matthewmturner but I think that will effectively require people to have checked out the datafusion-cli repo any time they want to build datafusion; So if it is required to build datafusion, the benefits of putting it in a separate repo seem pretty small 🤔
Filed https://github.com/apache/arrow-datafusion/issues/1814 to see if we can solict some more help for the doc site
https://github.com/apache/arrow-datafusion/pull/1816 <-- for optional ballista feature in datafusion-cli
🤔 I don't think https://github.com/apache/arrow-datafusion/pull/1816 is sufficient to publish datafusion-cli
to crates.io -- cargo still tries to resolve the dependencies of ballista (even though it is an optional dependency)
The only way I could get it to publish was to comment out the ballista dependency all together 8750db3bea 🤔
Update here: I would like to wait for the arrow 9.0.0 release to be published (later today or tomorrow) and then update datafusion to use it: https://github.com/apache/arrow-datafusion/pull/1775
Then I'll try and make a release candidate tomorrow or Monday
Its going to be great!
I am actively working to create a release candidate for datafusion 7.0.0
@matthewmturner would you be willing to start a PR for a blog post: https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing ?
The DataFusion 6.0.0 announcement is here for reference: https://github.com/apache/arrow-site/pull/160
@matthewmturner would you be willing to start a PR for a blog post: https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing ?
The DataFusion 6.0.0 announcement is here for reference: apache/arrow-site#160
Sure - will do now!
Official mailing list post with all the details is here: https://lists.apache.org/thread/t8381y8x1t452dvqr3y7h85q4dncvwrx
Thanks, @alamb @matthewmturner ❤️ I will help review
The release was approved and published 🎉
Mailing list thread is here: https://lists.apache.org/thread/hcpcf3shlt0l3fm2k313tq1tvrczlowf
The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-7.0.0
I have also published it to crates.io here: https://crates.io/crates/datafusion/7.0.0
I wonder if it is time to release a new version of datafusion to crates.io?
It would be great to crowdsource:
I am happy to handle creating a release candidate / doing the official voting process.