I'm not sure if this is the right place for this. It's more a question than a real bug.
Describe the bug
I tried to generate the logical plan of a query, instead of passing the query's text in the .sql function, using substrait. The substrait compilation fails while the function executes it without any problem.
What is the reason behind this behavior?
To Reproduce
import datafusion
from datafusion.substrait import substrait as ss
import pyarrow as pa
import pyarrow.dataset as pda
from faker import Faker
print(f"DF: {datafusion.__version__}\nPA: {pa.__version__}") # DF: 32.0.0 PA: 14.0.2
fake = Faker()
N_ROWS = 1_000
dummy_table = pa.Table.from_pydict(
{
"id": range(N_ROWS),
"name": (fake.name() for _ in range(N_ROWS)),
"country_code": (fake.country_code() for _ in range(N_ROWS)),
}
)
q = """
SELECT
"t1".*
, "t2".*
FROM "table" "t1"
INNER JOIN "table" "t2"
ON "t1"."id" = CASE WHEN "t2"."id" < 10 THEN "t2"."id" ELSE 10 END
"""
ctx = datafusion.SessionContext()
ctx.register_dataset(name="table", dataset=pda.dataset(dummy_table))
df = ctx.sql(q)
default_plan = df.logical_plan()
plan = ss.serde.serialize_to_plan(q, ctx)
logical_plan = ss.consumer.from_substrait_plan(ctx, plan) # <- Exception here
df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
ss_plan = df.logical_plan()
Hi Everyone,
I'm not sure if this is the right place for this. It's more a question than a real bug.
Describe the bug I tried to generate the logical plan of a query, instead of passing the query's text in the
.sql
function, using substrait. The substrait compilation fails while the function executes it without any problem.What is the reason behind this behavior?
To Reproduce
Exception is:
Expected behavior