Closed comphead closed 5 months ago
The TPCH benchmarks for SMJ https://github.com/apache/arrow-datafusion/pull/10092
Only Q21 failing
Narrowed down the problem to query
with
t1 as (select 1 a, 2 b)
select * from t1 where exists (select 1 from t1 t2 where t2.a = t1.a and t2.b != t1.b);
UPD: simpler reproduce query
The problem is in LeftSemi/LeftAnti join types with extra join filter. So the join side gets built correctly but join filter for notEq case gets crashed
Full test to reproduce
#[tokio::test]
async fn test_() -> Result<()> {
let ctx: SessionContext = SessionContext::new();
let sql = "set datafusion.optimizer.prefer_hash_join = false;";
let _ = ctx.sql(sql).await?.collect().await?;
let sql = "
with
t1 as (select 1 a, 2 b)
select * from t1 where exists (select 1 from t1 t2 where t2.a = t1.a and t2.b != t1.a);
";
let _ = ctx.sql(sql).await?.collect().await?;
Ok(())
}
I was trying to mark the Sort Merge Join as stable and run TPCH tests with SMJ enforced. Got the issue below, we need to fix before returning to discuss SMJ stable status
To run benches
RESULTS_NAME=smj ./benchmarks/bench.sh run tpch_smj
Depends on #10092
Originally posted by @comphead in https://github.com/apache/arrow-datafusion/issues/9846#issuecomment-2057999488