Open Jefffrey opened 4 months ago
ctx.sql(..)
returns a PyDataFrame
, and then you are observing the python shell call PyDataFrame::__repr__
to print it out, which applies a LIMIT
and collects the first 10 rows.
If you were to assign it and then collect, you'd see the proper result.
>>> df = ctx.sql("explain select 1")
>>> df.collect()
[pyarrow.RecordBatch
plan_type: string not null
plan: string not null
----
plan_type: ["logical_plan","physical_plan"]
plan: ["Projection: Int64(1)
EmptyRelation","ProjectionExec: expr=[1 as Int64(1)]
PlaceholderRowExec
"]]
And then here's the error reproduced with a little more detail.
>>> df = ctx.sql("explain select 1")
>>> # we can print the (unsupported) logical plan - notice `EXPLAIN` is not the top row of the plan
>>> df.limit(count=10, offset=0).explain()
DataFrame()
+--------------+--------------------------+
| plan_type | plan |
+--------------+--------------------------+
| logical_plan | Limit: skip=0, fetch=10 |
| | Explain |
| | Projection: Int64(1) |
| | EmptyRelation |
+--------------+--------------------------+
>>> # but calling collect reproduces the error
>>> df.limit(count=10, offset=0).collect()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: Internal error: Unsupported logical plan: Explain must be root of the plan.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Describe the bug
Original issue from datafusion repository: https://github.com/apache/arrow-datafusion/issues/9319
Error when doing explain directly in SQL.
To Reproduce
Using latest available version on Pypi (35.0.0)
Expected behavior
The
ctx.sql("explain select 1")
should ideally succeed, as it works in regular DataFusion:Will output:
Additional context
Although the more correct way would be to do
ctx.sql("select 1").explain()
,ctx.sql("explain select 1")
itself is technically still valid, so maybe something weird is going on here?