Closed andygrove closed 10 months ago
Hi @andygrove, after this change, the datafusion functions submodule seems not included in PyBallista.
If I try to find functions
within PyBallista
from pyballista import SessionContext, functions as f
filename = "sample.parquet"
ctx = SessionContext("localhost", 50050)
df = ctx.read_parquet(path=filename)
uri = df.select(f.col("stream_id"))
df.limit(10).show()
I'm getting the following error:
ImportError: cannot import name 'functions' from 'pyballista' (/Users/.../datafusion-ballista/python/pyballista/__init__.py)
Or if I change my imports to:
from pyballista import SessionContext
from datafusion import functions as f
Then it says:
File "examples/test.py", line 11, in <module>
uri = df.select(f.col("stream_id"))
TypeError: argument 'args': 'Expr' object cannot be converted to 'Expr'
Which issue does this PR close?
N/A
Rationale for this change
The Python bindings in https://github.com/apache/arrow-ballista-python were created by cloning the DataFusion Python bindings and then making some modifications. This project has been unmaintained for around one year now.
This PR adds new Python bindings which depend on the
datafusion-python
project rather than copying all of the code. I propose that we archive the old repo.This project will be versioned and released independently from the main project and is not part of the default Cargo workspace, so will not get in the way of Rust development work.
Output:
What changes are included in this PR?
New
pyballista
folder containing the new Python bindings.Are there any user-facing changes?
No