datafusion-contrib / datafusion-python

Python binding for DataFusion
https://arrow.apache.org/datafusion/python/index.html
Apache License 2.0
59 stars 12 forks source link

Add PyDataFrame.explain #36

Closed andygrove closed 2 years ago

andygrove commented 2 years ago

Closes https://github.com/datafusion-contrib/datafusion-python/issues/35

This PR adds the explain method to PyDataFrame.

>>> from datafusion import ExecutionContext
>>> ctx = ExecutionContext()
>>> ctx.register_parquet("store_sales", "/mnt/bigdata/tpcds/sf100-parquet/store_sales.dat")
>>> df = ctx.sql("SELECT count(*) FROM store_sales")
>>> df.explain()
+---------------+-------------------------------------------------------------+
| plan_type     | plan                                                        |
+---------------+-------------------------------------------------------------+
| logical_plan  | Projection: #COUNT(UInt8(1))                                |
|               |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]         |
|               |     TableScan: store_sales projection=Some([0])             |
| physical_plan | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))] |
|               |   ProjectionExec: expr=[287997024 as COUNT(UInt8(1))]       |
|               |     EmptyExec: produce_one_row=true                         |
|               |                                                             |
+---------------+-------------------------------------------------------------+