[FRG-182] union all of two tables with differing column name where the second table has a filter fails

dynamobi-build commented 12 years ago

[reporter="elin", created="Thu, 10 Aug 2006 15:12:41 -0500 (GMT-05:00)"] a simple union all for two tables with the same datatypes, but with different column names works fine. The result will have the same column names as the first table. A filter on the 1st table also works fine. But if there is a filter on the 2nd table, an "
Optimizer failed to find a valid physical implementation for relational expression..." error is produced

=================
To reproduce:

create schema temp;
set schema 'temp';
create table tt (a int, b int, c int);
insert into tt values (1, 2, 3), (2,3,4);
create table t2 (a int, bb int, c int);
insert into t2 values (3, 4, 4), ( 2, 3, 3);

-- these work fine, first one gives column names a,b,c the second one a,bb,c
select * from tt union all select * from t2;
select * from t2 union all select * from tt;

-- this works fine too
(select * from tt where a=1) union all select * from t2;

-- gets:
+----+----+----+
| A | B | C |
+----+----+----+
| 1 | 2 | 3 |
| 3 | 4 | 4 |
| 2 | 3 | 3 |
+----+----+----+

-- fails here:
select * from t2 union all (select * from tt where a=1);

================
stack trace:

0: jdbc:luciddb:> select * from t2 union all (select * from tt where a=1);
Error: Optimizer failed to find a valid physical implementation for relational expression rel#7053:UnionRel.NONE(input#0=HepRelVertex#7046,input#1=HepRelVertex#7062,all=true); see trace for partially optimized plan (state=,code=0)
net.sf.farrago.jdbc.FarragoJdbcUtil$FarragoSqlException: Optimizer failed to find a valid physical implementation for relational expression rel#7053:UnionRel.NONE(input#0=HepRelVertex#7046,input#1=HepRelVertex#7062,all=true); see trace for partially optimized plan
        at net.sf.farrago.resource.FarragoResource$Def0.ex(FarragoResource.java:1512)
        at net.sf.farrago.query.FarragoPreparingStmt.optimize(FarragoPreparingStmt.java:706)
        at org.eigenbase.oj.stmt.OJPreparingStmt.prepareSql(OJPreparingStmt.java:299)
        at net.sf.farrago.query.FarragoPreparingStmt.prepare(FarragoPreparingStmt.java:270)
        at net.sf.farrago.db.FarragoDatabase$2.initializeEntry(FarragoDatabase.java:824)
        at net.sf.farrago.util.FarragoObjectCache.pin(FarragoObjectCache.java:153)
        at net.sf.farrago.db.FarragoDatabase.prepareStmtImpl(FarragoDatabase.java:836)
        at net.sf.farrago.db.FarragoDatabase.prepareStmt(FarragoDatabase.java:707)
        at net.sf.farrago.db.FarragoDbSession.prepareImpl(FarragoDbSession.java:935)
        at net.sf.farrago.db.FarragoDbSession.prepare(FarragoDbSession.java:862)
        at net.sf.farrago.db.FarragoDbStmtContext.prepare(FarragoDbStmtContext.java:105)
        at net.sf.farrago.jdbc.engine.FarragoJdbcEngineStatement.execute(FarragoJdbcEngineStatement.java:107)
        at sqlline.SqlLine$Commands.execute(Unknown Source)
        at sqlline.SqlLine$Commands.sql(Unknown Source)
        at sqlline.SqlLine.dispatch(Unknown Source)
        at sqlline.SqlLine.begin(Unknown Source)
        at sqlline.SqlLine.mainWithInputRedirection(Unknown Source)
        at sqlline.SqlLine.main(Unknown Source)

=========
partial optimization plan from the trace:
Aug 10, 2006 3:15:57 PM net.sf.farrago.db.FarragoDbSession prepare
INFO: select * from t2 union all (select * from tt where a=1)
Aug 10, 2006 3:15:57 PM net.sf.farrago.query.FarragoPreparingStmt optimize
SEVERE: Plan without full implementation
UnionRel(all=[true]): rowcount = 125.0, cumulative cost = 225.0
  LcsRowScanRel(table=[[LOCALDB, TEMP, T2]], projection=[], clustered indexes=[[SYS$CLUSTERED_INDEX$TEMP$T2$A, SYS$CLUSTERED_INDEX$TEMP$T2$BB, SYS$CLUSTEREDINDEX$TEMP$T2$C]]): rowcount = 100.0, cumulative cost = 100.0
  FennelCalcRel(expr#0..2=[{inputs}], expr#3=[1], expr#4=[=($t0, $t3)], proj#0..2=[{exprs}], $condition=[$t4]): rowcount = 25.0, cumulative cost = 125.0
    LcsRowScanRel(table=[[LOCALDB, TEMP, TT]], projection=[], clustered indexes=[[SYS$CLUSTERED_INDEX$TEMP$TT$A, SYS$CLUSTERED_INDEX$TEMP$TT$B, SYS$CLUSTERED_INDEX$TEMP$TT$C]]): rowcount = 100.0, cumulative cost = 100.0

dynamobi-build commented 12 years ago

[author="jvs", created="Thu, 10 Aug 2006 23:55:09 -0500 (GMT-05:00)"] Rushan, I guess you inherited UNION along with the other setops :)

Normally CoerceInputsRule takes care of casting the inputs to the union match in type, so that FennelUnionRule doesn't have to worry about adding casts. I wonder if that is somehow not happening? Or did some other rule in between the coerce and the union mess things up?

dynamobi-build commented 12 years ago

[author="rchen", created="Fri, 11 Aug 2006 01:32:57 -0500 (GMT-05:00)"] Yes CoerceInputsRule does add a Project to cast the col names in one of the UNION inputs. The other project(a=1) is interesting here, as the initial plan looks like:

Union
  Project(cast)
    Filter(a=1)
     RowScan

The rule ProjectToCalc and FilterToCalcRule uses Calc to replace Project and Filter, with the result:

Union
Calc1(cast)
  Calc2(a=1)
   RowScan

The last rule applied is MergeCalcRule. After this rule is applied, the union should have a input of the compbined calc that evaluates a = 1 and has typename "B". However, this combined calc(call it calc3) has the same siganiture as calc2 so the input replacement is done incorrectly: calc2 is considered the "newinput" and calc3 is considered the old input, so the resulting tree looks like

Union
Calc2(a=1)
  RowScan

Because Calc2 still projects "BB", the first check which is strict on type name matching

!unionRel.isHomogeneous()

in FennelUnionRule fails and no physical implementation can be found for this Union.

Not sure where the fix should be. The easy one could be to relax the type name check in FennelUnionRule. The other question I have is how does a converted Rel match a subtree that is converted into this new Rel? Should the signature computation take into account the type name as well?

dynamobi-build commented 12 years ago

[author="rchen", created="Fri, 11 Aug 2006 20:28:36 -0500 (GMT-05:00)"] Using the example above, the combined calc(calc3) has the same signature as calc2 because the way digest for calc3 is computed, even though calc3 does have the correct output field name "B". CalcRel member RexProgram contains the detail of the calc, including the inputRowType and outputRowType of the CalcRel. RexProgram::collectExplainTerms() optimizes the number of output fields to print in the explain plan string by checking if these fields are simply projections, even though the proejcted field names are different from the input field names.

One possible fix is to disable this optimization, for example when the input and output row types are different. Another fix is to use different detail levels for explain and digest. Details like coerced field names are interesting for signature computing, but might be too much for explain plan. Ideally the explain plan detail level can be controled by the user. However, that is not related to fixing this bug.

dynamobi-build commented 12 years ago

[author="jvs", created="Sun, 13 Aug 2006 15:55:03 -0500 (GMT-05:00)"] On the subject of controlling EXPLAIN PLAN verbosity, see FRG-8.

dynamobi-build commented 12 years ago

[author="rchen", created="Mon, 14 Aug 2006 23:30:31 -0500 (GMT-05:00)"] Fixed in Eigenbase change 7393. A new plan detail level(DIGEST_ATTRIBUTES) is used for digest computation. Explain plan is changed to use a different level(EXPPLAN_ATTRIBUTES) from digest.

LucidDB / luciddb

[FRG-182] union all of two tables with differing column name where the second table has a filter fails #690