apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.17k stars 1.16k forks source link

Release Datafusion 6.0.0 #890

Closed houqp closed 2 years ago

houqp commented 3 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We had some oversights in the 5.0.0 release (https://github.com/apache/arrow-datafusion/issues/771) causing us not able to release the python binding and datafusion-cli.

Describe the solution you'd like

Release Datafusion 5.1.0 with an improved process to support python binding and cli releasse.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context see https://github.com/apache/arrow-datafusion/issues/887, https://github.com/apache/arrow-datafusion/issues/883 and https://github.com/apache/arrow-datafusion/issues/837

mmuru commented 3 years ago

@houqp: Do we have an ETA on the python binding release? Thanks.

houqp commented 3 years ago

@mmuru probably 2-3 weeks, need to get some legal issues resolved in https://github.com/apache/arrow-datafusion/pull/920 before we can cut a release tarball and start the voting process.

mmuru commented 3 years ago

@houqp: Thanks for the update. I am trying to build python wheel locally but I noticed in Cargo.toml file the dependency listed datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4d61196dee8526998aee7e7bb10ea88422e5f9e1" } It did not get updated. Will maturin develop pick up the latest datafusion code? Please, can you clarify it? Thanks again.

houqp commented 3 years ago

@mmuru it will not, I just sent https://github.com/apache/arrow-datafusion/pull/967 to handle the datafusion update.

mmuru commented 3 years ago

@houqp: Thanks for the clarification and quick turnaround fix. I verified your changes and found two issues.

  1. the requirement.txt is locked to Python 3.8 version. I think, this file should not be checked in the source since the python development version could be different. In my case, it was python 3.7 version.
  2. Fixed the issue #949. Added support for pyarrow datatypes such as date32, date64 and Timestamp. I would like to submit my changes as part of #967 PR. Please, let me know.
houqp commented 3 years ago

the requirement.txt is locked to Python 3.8 version.

The requirements.txt is supposed to work for all python versions supported by the python binding. if it's broken for python 3.7, could you file a separate issue with the error message? We can continue the troubleshooting there.

I would like to submit my changes as part of update datafusion to 5.1.0 for python binding #967 PR. Please, let me know.

That's great, I recommend you send a separate PR based off the branch in #967 for this, or collaborate on https://github.com/apache/arrow-datafusion/pull/969 to get that issue addressed.

mmuru commented 3 years ago

@houqp: Sure, created #975. Ping me if you need more information.

houqp commented 3 years ago

@andygrove @alamb @Dandandan @jorgecarleitao @nevi-me given that we have had many major breaking changes merged in since the 5.x release, I am thinking maybe it's better to skip 5.1 and go 6.0 after #1010 gets merged. What do you think?

alamb commented 3 years ago

merged in since the 5.x release

I think using 6.0 is a good idea. I also don't think we need to wait for #1010 to be merged for a release, if we need to get the python binding / cli out sooner

houqp commented 3 years ago

sounds good, I think think I can try help push https://github.com/apache/arrow-datafusion/pull/873 to the finish line after you have arrow 6 released.

alamb commented 3 years ago

sounds good, I think think I can try help push #873 to the finish line after you have arrow 6 released.

It sounds like we are aiming to release arrow 6.0 in 2 weeks or so

tupshin commented 3 years ago

arrow 6 has been released. Any ETA on this one? I'm really looking forward to an up to date python API, in particular

houqp commented 3 years ago

@tupshin I pinged #873 again, once that's merged, we could kick off the release process, which usually takes 3-5 days.

jimexist commented 2 years ago

related https://github.com/Homebrew/homebrew-core/pull/88184

tupshin commented 2 years ago

Not to nag, but I see 873 is merged. How we doing?

houqp commented 2 years ago

I am working on the changelog and the release PR, should be out this weekend.

alamb commented 2 years ago

FYI see https://github.com/apache/arrow-datafusion/pull/1253

houqp commented 2 years ago

rc0 tag pushed, working on automation to package and sign python wheels now. once that's done, i will send out the request for vote email.

houqp commented 2 years ago

Vote passed and I have pushed the release tags into Github. The release steps requires PMC member access. @alamb @andygrove @jorgecarleitao @kszucs could one of you follow the steps in https://github.com/apache/arrow-datafusion/tree/master/dev/release#finalize-the-release to complete the release?

The remaining steps are:

houqp commented 2 years ago

@Jimexist we should be able to update datafusion-cli in homebrew as well.

jimexist commented 2 years ago

https://github.com/Homebrew/homebrew-core/pull/89562

alamb commented 2 years ago

Vote passed and I have pushed the release tags into Github. The release steps requires PMC member access. @alamb @andygrove @jorgecarleitao @kszucs could one of you follow the steps in https://github.com/apache/arrow-datafusion/tree/master/dev/release#finalize-the-release to complete the release?

@houqp I will do so now. Thank you for all the work in this regard

houqp commented 2 years ago

Thanks @alamb ! I just noticed I forgot to add (cd datafusion-cli && cargo publish) in the release doc, could you help run that command as well? I will send a PR to get the doc updated later today.

@andygrove @jorgecarleitao @kou @kszucs @xhochy we will need your help to publish the python binding to PyPI since only you are listed as maintainers of the PyPI package. The steps are documented at https://github.com/apache/arrow-datafusion/tree/master/dev/release#publish-on-pypi

xhochy commented 2 years ago

I'd rather give more people access to PyPI ;)

kou commented 2 years ago

I'm trying.

I found a typo in the document:

diff --git a/dev/release/README.md b/dev/release/README.md
index 2127dc23..73b3eb1a 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -304,7 +304,7 @@ PyPI, in order to conform to Apache Software Foundation governance standards.
 First, download all official python release artifacts:

 ```shell
-svn co https://dist.apache.org/repos/dist/release/arrow/apache-arrow-datafusion-5.1.0-rc0/python ./python-artifacts
+svn co https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-5.1.0/python ./python-artifacts

Use twine to perform the upload.



https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-6.0.0/python/ uses 0.4.0 not 6.0.0. Is it OK?
houqp commented 2 years ago

I'd rather give more people access to PyPI ;)

+1 :D

Good catch @kou , I will include that fix in my docs PR. The version diff is expected because we want them to be decoupled so we can release major version change in the python binding without forcing a major version bump in datafusion.

kou commented 2 years ago

OK. I've published them: https://pypi.org/project/datafusion/0.4.0/

I found one more typo:

diff --git a/dev/release/README.md b/dev/release/README.md
index 2127dc23..fcf090e3 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -310,7 +310,7 @@ svn co https://dist.apache.org/repos/dist/release/arrow/apache-arrow-datafusion-
 Use [twine](https://pypi.org/project/twine/) to perform the upload.

 ```shell
-twine upload ./python-artifactl/*.{tar.gz,whl}
+twine upload ./python-artifacts/*.{tar.gz,whl}

Call the vote

houqp commented 2 years ago

Thank you @kou ! I will include that fix in my docs PR as well :)

alamb commented 2 years ago

Thanks @alamb ! I just noticed I forgot to add (cd datafusion-cli && cargo publish) in the release doc, could you help run that command as well? I will send a PR to get the doc updated later today.

Hi @houqp

I tried to publish datafusion-cli and I got the following error. It looks like datafusion-cli relies on ballista somehow

(arrow_dev) alamb@MacBook-Pro:~/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli$ cargo publish
    Updating crates.io index
warning: manifest has no description.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
   Packaging datafusion-cli v5.1.0-SNAPSHOT (/Users/alamb/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli)
error: failed to prepare local package for uploading

Caused by:
  failed to select a version for the requirement `ballista = "^0.6.0"`
  candidate versions found which didn't match: 0.5.0, 0.3.0, 0.2.5, ...
  location searched: crates.io index
  required by package `datafusion-cli v5.1.0-SNAPSHOT (/Users/alamb/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli)`
houqp commented 2 years ago

oh yeah, it supports ballista as a way to perform remote query execution. @alamb looks like we haven't published the ballista crates yet? could you do that first by following https://github.com/apache/arrow-datafusion/tree/master/dev/release#publish-on-cratesio?

alamb commented 2 years ago

could you do that first by following https://github.com/apache/arrow-datafusion/tree/master/dev/release#publish-on-cratesio?

Done (updated instructions in https://github.com/apache/arrow-datafusion/pull/1331)

Turns out I still can't upload datafusion-cli package:

(arrow_dev) alamb@MacBook-Pro:~/Downloads/apache-arrow-datafusion-6.0.0$ (cd datafusion-cli && cargo publish)
....
    Finished dev [unoptimized + debuginfo] target(s) in 1m 20s
   Uploading datafusion-cli v5.1.0-SNAPSHOT (/Users/alamb/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli)
error: failed to publish to registry at https://crates.io

Caused by:
  the remote server responded with an error: invalid upload request: invalid length 7, expected at most 5 keywords per crate at line 1 column 3441

~I will file a ticket~ Tracked by https://github.com/apache/arrow-datafusion/issues/1332