Open tgujar opened 7 months ago
It looks like some sort of optimization pass has been applied to plan 2. Maybe we need to not optimize it somehow 🤔
Plan 1 is the produced Logical plan, and Plan 2 is Plan1 converted to substrait and then converted back to Logical plan. I think the issue arises because of differences in what can be expressed by substrait grammar vs the logical plan generated by Datafusion. For tests where the plans differ, using assert_expected_plan instead of roundtrip_with_ctx should work after manual inspection. I think I want to look into this further but maybe we could normalize both plans somehow and then check for equality
I think I want to look into this further but maybe we could normalize both plans somehow and then check for equality
I think this sounds like a good plan to me. Thank you for the investigation
Maybe a simple solution would be to just convert to substrait again and then compare?
async fn roundtrip_with_ctx(sql: &str, ctx: SessionContext) -> Result<()> {
let df = ctx.sql(sql).await?;
let plan = df.into_optimized_plan()?;
let proto = to_substrait_plan(&plan, &ctx)?;
let plan2 = from_substrait_plan(&ctx, &proto).await?;
let plan2 = ctx.state().optimize(&plan2)?;
let proto2 = to_substrait_plan(&plan2, &ctx)?;
println!("{plan:#?}");
println!("{plan2:#?}");
assert_eq!(proto, proto2);
Ok(())
}
Maybe, I am unsure how to rewrite the query to avoid alias here
SELECT * FROM data WHERE a IN (SELECT a FROM data2 WHERE f NOT IN ('a', 'b', 'c', 'd'))
assert_expected_plan
works fine for cases like this where plans are equivalent but don't have the same representation as a string. I am not sure if its usage in tests is encouraged though. If I understand correctly, changes to the plan generation might break tests with assert_expected_plan
where we compare for exact string equality.
Describe the bug
roundtrip_with_ctx function checks for string equality when comparing plans. However, string compare seems to not give correct results. For e.g Here are two plans which are equivalent but considered different by the function.
Plan 1:
Plan 2:
It might also give incorrect comparison result in case where there are more than one
partial_filters
but in a different ordering inside the vector.To Reproduce
Here is an example testcase I used to produce the plans in the
Describe the bug
section.Expected behavior
The two plans should be considered equivalent.
Additional context
No response