datafusion-contrib / datafusion-python

Python binding for DataFusion
https://arrow.apache.org/datafusion/python/index.html
Apache License 2.0
59 stars 12 forks source link

Add support for Ballista #37

Closed andygrove closed 2 years ago

andygrove commented 2 years ago

I would like to be able to execute queries against Ballista from Python.

I think this is just a case of adding a new PyBallistaContext class.

This should be an optional feature, disabled by default.

andygrove commented 2 years ago

@Jimexist Does it make sense to add Ballista support here or should we have a separate ballista-python repo that somehow re-uses parts of datafusion-python ?

matthewmturner commented 2 years ago

In my mind it makes sense to make a separate ballista-python as I view datafusion-python to be it's own standalone system just as ballista is. However I acknowledge there may be significant overlap. That being said there's been a lot of work lately on ballista which will likely continue so it could be a good time to decouple them for the purpose of Python bindings.

andygrove commented 2 years ago

Thanks for the input ... I will close this issue and start a new repo and copy and paste much from this repo for now

nl5887 commented 2 years ago

@andygrove this is my take at the ballista-python crate, any suggestions? https://github.com/nl5887/ballista-python

andygrove commented 2 years ago

Hi @nl5887 thanks for working on this! I think you could PR this directly into the arrow-ballista repo into a top-level Python folder. I would love to try this out but I am not a Python expert so may need some help. I followed the instructions in "How to develop" and it looks like everything installed but I am not sure how to import the project in the Python repl.

 python
Python 3.9.5 (default, Jun  4 2021, 12:28:51) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import ballista
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'ballista'

>>> import datafusion
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/andy/git/personal/ballista-python/datafusion/__init__.py", line 29, in <module>
    from ._internal import (
ModuleNotFoundError: No module named 'datafusion._internal'
nl5887 commented 2 years ago

Could be that I still had some left-overs from the old python datafusion project. I'll rename everything to ballista, make sure both datafusion and ballista modules can co-exist and do some additional cleanup. When ready will make a PR for arrow-ballista repo. Thanks!

nl5887 commented 2 years ago

@andygrove just pushed latest code to my repo (https://github.com/nl5887/ballista-python). This should work for you.

andygrove commented 2 years ago

@nl5887 I just tried it and it works beautifully :heart:

I had also missed a step in my previous attempt ... I can't wait to see this in Ballista! Thanks so much for working on this.