Seriously non-optimal query plan is generated under some circumstances

GoogleCodeExporter commented 9 years ago

I have tried to reproduce a simple test case, however that simple test case 
works correctly… so I can only show my working. Unfortunately, I cannot 
include the real example, as that is full of company confidential information.

The problem is that I have a query which is going over four tables, and is 
joining itself a few times, just for good measure.

The (simplified) tables look something like:

CREATE TABLE TEST1
(
   ID bigint PRIMARY KEY NOT NULL,
   WANTED bigint,
   VERSION bigint,
   T2 bigint,
   T3 bigint,
   NAME varchar(80) DEFAULT ' ' NOT NULL,
   MT1 bigint);
CREATE UNIQUE INDEX TEST11 ON "PUBLIC".TEST1
(
  T3,
  NAME,
  VERSION
);
CREATE INDEX TEST10 ON "PUBLIC".TEST1
(
  T2
);
CREATE TABLE TEST2
(
   ID bigint PRIMARY KEY NOT NULL,
   T3 bigint);
CREATE INDEX TEST20 ON TEST2
(
  T3
);
CREATE TABLE TEST3
(
   ID bigint PRIMARY KEY NOT NULL,
);
CREATE TABLE TEST4
(
   ID bigint PRIMARY KEY NOT NULL,
   T1 bigint,
   T2 bigint);
CREATE UNIQUE INDEX TEST41 ON TEST4
(
  T1
);
CREATE INDEX TEST42 ON TEST4
(
  T2
);

And the query that is causing the problems looks like:

explain select
t1.ID
from TEST1 t1
where
(
   t1.VERSION = -1
   and t1.NAME = N'NAME_TO_SEARCH_FOR'
   and exists
   (
      select
      1
      from TEST4 t4,
      TEST1 ot1,
      TEST1 mt1,
      TEST2 t2,
      TEST3 t3
      where
      (
         t1.t3 = t3.id
         and t4.t1 = ot1.id
         and ot1.mt1 = mt1.id
         and t4.t2 = t2.id
         and t2.t3 = t3.id
         and mt1.version = -1
      )
      group by mt1.id having
      (
         t1.WANTED < COUNT(*)
      )
   )
)

For the purposes of this bug, please ignore the minor optimisations that could 
be done in the query to avoid pulling in a few of the tables, while they would 
help, they are not the main problem.

Using the above I get a query plan of:

SELECT T1.ID
FROM PUBLIC.TEST1 T1 /* PUBLIC.TEST1.tableScan */
WHERE ((T1.VERSION = -1) AND (T1.NAME = 'NAME_TO_SEARCH_FOR')) AND 
EXISTS(SELECT 1
FROM PUBLIC.TEST3 T3 /* PUBLIC.PRIMARY_KEY_4C055: ID = T1.T3 */ /* WHERE T1.T3 
= T3.ID */
INNER JOIN PUBLIC.TEST2 T2 /* PUBLIC.TEST20: T3 = T3.ID */ ON 1=1 /* WHERE 
T2.T3 = T3.ID */
INNER JOIN PUBLIC.TEST4 T4 /* PUBLIC.TEST42: T2 = T2.ID */ ON 1=1 /* WHERE 
T4.T2 = T2.ID */
INNER JOIN PUBLIC.TEST1 OT1 /* PUBLIC.PRIMARY_KEY_4C0: ID = T4.T1 */ ON 1=1 /* 
WHERE T4.T1 = OT1.ID */
INNER JOIN PUBLIC.TEST1 MT1 /* PUBLIC.PRIMARY_KEY_4C0: ID = OT1.MT1 */ ON 1=1
WHERE (MT1.VERSION = -1) AND ((T2.T3 = T3.ID) AND ((T4.T2 = T2.ID) AND 
((OT1.MT1 = MT1.ID) AND ((T1.T3 = T3.ID) AND (T4.T1 = OT1.ID)))))
GROUP BY MT1.ID
HAVING T1.WANTED < COUNT(*))

Which is fine. It goes through every row, and if both NAME and VERSION match, 
then it decides to do the exists clause. However, on my real database I get a 
query plan of:

SELECT T1.ID
FROM PUBLIC.TEST1 T1 /* PUBLIC.TEST1.tableScan */
WHERE (EXISTS(SELECT 1
FROM PUBLIC.TEST3 T3 /* PUBLIC.PRIMARY_KEY_4C055: ID = T1.T3 */ /* WHERE T1.T3 
= T3.ID */
INNER JOIN PUBLIC.TEST2 T2 /* PUBLIC.TEST20: T3 = T3.ID */ ON 1=1 /* WHERE 
T2.T3 = T3.ID */
INNER JOIN PUBLIC.TEST4 T4 /* PUBLIC.TEST42: T2 = T2.ID */ ON 1=1 /* WHERE 
T4.T2 = T2.ID */
INNER JOIN PUBLIC.TEST1 OT1 /* PUBLIC.PRIMARY_KEY_4C0: ID = T4.T1 */ ON 1=1 /* 
WHERE T4.T1 = OT1.ID */
INNER JOIN PUBLIC.TEST1 MT1 /* PUBLIC.PRIMARY_KEY_4C0: ID = OT1.MT1 */ ON 1=1
WHERE (MT1.VERSION = -1) AND ((T2.T3 = T3.ID) AND ((T4.T2 = T2.ID) AND 
((OT1.MT1 = MT1.ID) AND ((T1.T3 = T3.ID) AND (T4.T1 = OT1.ID)))))
GROUP BY MT1.ID
HAVING T1.WANTED < COUNT(*)) AND (T1.VERSION = -1)) AND (T1.NAME = 
'NAME_TO_SEARCH_FOR')

That is, for every row in the TEST1, it first checks if the exist clause 
exists, and then decides to check the two local fields. Given that TEST1 has 
~200 rows this results in a massive number of the exists clause tests being 
run, when in actual fact we only have one row in TEST1 that matches the 
query… this means that the query takes about 200 time longer than it should, 
given that the inner query is taking about 2.5 seconds, this really hurts. Now, 
this inner query can be improved somewhat so that it takes less time, however 
the problem still remains that the exist clause should not be considered 
cheaper to evaluate than a local number or string comparison.

I have run analyze, that has not helped.

I can run diagnositics against my database if that will help, if you let me 
know what they should be.

Original issue reported on code.google.com by pwagl...@gmail.com on 26 Feb 2011 at 1:46

GoogleCodeExporter commented 9 years ago

I should also mention that I am using H2 1.2.147.

Original comment by pwagl...@gmail.com on 26 Feb 2011 at 1:47

GoogleCodeExporter commented 9 years ago

I have also tested with 1.3.151. This has essentially the same result. The 
results that I get from explain analyse, using a simplified form of the query 
to remove a couple of tables from the inner query are:

/*
total: 3849250
TEST4.TEST41 read: 101136 (2%)
TEST4.TEST4_DATA read: 3738316 (97%)
TEST1.TEST1_DATA read: 9603 (0%)
TEST3.TEST3_DATA read: 195 (0%)
*/

Original comment by pwagl...@gmail.com on 26 Feb 2011 at 2:12

GoogleCodeExporter commented 9 years ago

Oddly enough, when I create the index over fields T1 and T4 on table Test4, 
thus making the exists cheaper, the query plan inverts again, and decides to 
check the fields NAME and VERSION before doing the exists clause. Not sure if 
this helps, but I figure all information is good ;-)

Original comment by pwagl...@gmail.com on 26 Feb 2011 at 2:54

GoogleCodeExporter commented 9 years ago

I think I know what the problem is.

AND conditions are re-ordered based on the expected cost (ConditionAndOr). And 
the expected cost of an EXISTS is always at last 8 points higher than the 
expected cost of a simple comparison. However, the cost for EXISTS is converted 
from the cost of the subquery (which is a double) to int. If the expected cost 
is very high, then this conversion will result in a expected cost below 0... So 
the EXISTS is evaluated first.

Original comment by thomas.t...@gmail.com on 26 Feb 2011 at 11:57

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

That would make a lot of sense, and would also explain why adding in the index 
to make the query "faster" moves it lower down in the pecking order. I will 
test the patch when it is available, of if you can let me know where the 
conversion is done I can put in the "too large, peg to MAX_INT' check and 
verify that it works.

Original comment by pwagl...@gmail.com on 26 Feb 2011 at 12:55

GoogleCodeExporter commented 9 years ago

This is now committed in the trunk, see revision 3444.

Original comment by thomas.t...@gmail.com on 26 Feb 2011 at 4:55

GoogleCodeExporter commented 9 years ago

I have tested using revision 3445, which appears to have another fix related to 
this ticket, and it indeed works as advertised. Many thanks. My query went from 
~400seconds execution to 0.01seconds! ;-)

Is there any chance of getting this fix in a stable release soon?

Original comment by pwagl...@gmail.com on 27 Feb 2011 at 11:39

GoogleCodeExporter commented 9 years ago

I should probably also explain the reasoning behind my request… my company 
has a policy of only using released, stable versions of the 3rd party s

Original comment by pwagl...@gmail.com on 27 Feb 2011 at 12:06

GoogleCodeExporter commented 9 years ago

I should probably also explain the reasoning behind my request… my company 
has a policy of only using released, stable versions of the 3rd party software 
that we include. Since we are not using the mvcc options, this slow query has a 
tendency of licking the entire database, and hence most of the ui, for almost 7 
minutes while this query is running. Unfortunately, this query runs every 
fifteen minutes or so…. Fortunately, it only happens when one of the tables 
gets very large, with a large associated cost, hence the reason we have only 
recently seen it.

Original comment by pwagl...@gmail.com on 27 Feb 2011 at 12:11

GoogleCodeExporter commented 9 years ago

My plan is to release a new version (still beta) next week, and then then 
another version (no longer beta) in about two weeks.

Original comment by thomas.t...@gmail.com on 27 Feb 2011 at 1:36

GoogleCodeExporter commented 9 years ago

Fixed in version 1.3.152

Original comment by thomas.t...@gmail.com on 1 Mar 2011 at 8:06

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Thanks again for the fix! Works perfectly.

Original comment by pwagl...@gmail.com on 17 Mar 2011 at 9:26

google-code-export / h2database

Seriously non-optimal query plan is generated under some circumstances #290