apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.4k stars 182 forks source link

[Python] Add support for `df.show()` on `CREATE EXTERNAL TABLE` statements #166

Open andygrove opened 1 year ago

andygrove commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

>>> df = ctx.sql("CREATE EXTERNAL TABLE orders STORED AS PARQUET LOCATION '/mnt/bigdata/tpch/sf1-parquet/orders'")
>>> df.show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception: Arrow error: External error: Execution error: Job hZh0A3x failed: Error planning job hZh0A3x: DataFusionError(Internal("Unsupported logical plan: CreateExternalTable"))

Describe the solution you'd like Make this work

Describe alternatives you've considered None

Additional context None

andygrove commented 1 year ago

It actually does work but we get an error if we try and show the results of the DataFrame. This is not good UX so we will need to fix.

>>> import ballista
>>> ctx = ballista.BallistaContext("localhost", 50050)
>>> ctx.sql("CREATE EXTERNAL TABLE orders STORED AS PARQUET LOCATION '/mnt/bigdata/tpch/sf1-parquet/orders'")
<ballista.DataFrame object at 0x7f501f088150>
>>> df = ctx.sql("SELECT count(*) FROM orders")
>>> df.show()
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 14999999        |
+-----------------+