Closed GoogleCodeExporter closed 9 years ago
You can use the following script to recreate the almost the same situation that
I see. In particular, it has the spurious table scan. Because of the @LOOP
construct, this needs to be done in the H2 console.
create table OBJECT1 ( ID BIGINT NOT NULL, REFID BIGINT NOT NULL, COL1 BIGINT
NOT NULL, PRIMARY KEY ( ID ) );
create INDEX OBJECT10 on OBJECT1 ( REFID );
create INDEX OBJECT11 on OBJECT1 ( COL1 );
create table OBJECT2 ( ID BIGINT NOT NULL, PRIMARY KEY ( ID ) );
@LOOP 41474 insert into Object2 values (?);
@LOOP 1059 insert into Object1 values (?,?,422);
@LOOP 165 insert into Object1 values (?+1059,?,423);
explain select o1.ID
from OBJECT1 o1
where o1.COL1 = 422
and o1.refID not in (select o2.ID from Object2 o2);
explain select o1.ID
from OBJECT1 o1
where o1.COL1 = 422
and o1.refID not in (select o2.ID from Object2 o2 order by o2.ID);
drop table OBJECT1;
drop table OBJECT2;
Original comment by pwagl...@gmail.com
on 17 Mar 2011 at 9:00
With a bit of experimentation, then what is also interesting is that if 10000
elements are inserted into Object2, then everything works just dandy, in both
cases. It still does a table scan instead of an index scan, but it returns in a
smallish number of ms.
However, go from
@LOOP 10000 insert into Object2 values (?);
to
@LOOP 10001 insert into Object2 values (?);
the the analyze time goes from 26ms to 56823 ms. Talk about going off of a
cliff!
Often when running with 10001 I also get other errors, out of memory and
"unexpected code path" errors.
Original comment by pwagl...@gmail.com
on 17 Mar 2011 at 9:21
Last comment for the day…
The "unexpected code path" errors only turned up for me when doing "explain
analyze", a plain explain did not cause them.
Increasing the amount of heap from default made the OOM exceptions go away
(surprising? not really ;-)
Original comment by pwagl...@gmail.com
on 17 Mar 2011 at 9:36
Unfortunately, this problem is a little more serious for me… the SQL parser
of the product that I am using does not allow the subselect to have an order by
clause… So I have no woraround available.
Original comment by pwagl...@gmail.com
on 18 Mar 2011 at 9:30
Another data point:
Given the two (functionally equivalent) queries:
explain select o1.ID
from OBJECT1 o1
where o1.COL1 = 422
and o1.refID not in (select o2.ID from Object2 o2);
explain select o1.ID
from OBJECT1 o1
where o1.COL1 = 422
and not exists (select 1 from Object2 o2 where o1.refID = o2.ID);
The first generates a plan that does a table scan, and the second generates a
query that uses the index.
Original comment by pwagl...@gmail.com
on 19 Mar 2011 at 11:07
OK. A bit more playing around, and it would appear the "root cause" of the
problem is that it is not recognising the tranformation from:
XX.F1 in (select YY.F2 from YY)
as being equivalent to
exists (select 1 from YY where YY.F2 = XX.F1)
this causes two issues:
1. A sub optimal query is produced, even if it were to use the index, it still does an index scan, and looks to see if the value is in that list, as opposed to doing a lookup.
2. When the YY table gets over 10000 elements in size, the query has a very high chance of simply not working.
Original comment by pwagl...@gmail.com
on 19 Mar 2011 at 11:33
The unexpected code path exception has been spit off to issue 304.
Original comment by pwagl...@gmail.com
on 19 Mar 2011 at 11:57
Please note that the original hypothesis in this ticket was wrong. Adding the
order by doesn't help, it actually makes the whole situation worse. The real
problem is that identified in comment 6… I would change the ticket title, but
do not seem to be able to do this.
Original comment by pwagl...@gmail.com
on 20 Mar 2011 at 1:11
Hi,
The problem with "IN(SELECT ... ORDER BY)" has been fixed.
It is true that "x NOT IN(SELECT ...)" is not converted to "NOT EXISTS(...)",
because that's a quite tricky conversion. I will add a feature request.
However, it currently doesn't have a very high priority for me, sorry.
Original comment by thomas.t...@gmail.com
on 5 Apr 2011 at 5:16
If you can give some hints as to where to start looking to do this, or even
better a similar transformation that is done elsewhere, I can have a look at
this, since this is important to me... I am happy to scratch that itch, but I
need someplace to start poking.
Original comment by pwagl...@gmail.com
on 5 Apr 2011 at 1:42
> If you can give some hints as to where to start looking to do this
Hm, I'm afraid that's not so easy... Maybe ConditionNot / ConditionIn /
ConditionExists optimize(), but it's quite complicated and easy to get wrong...
Original comment by thomas.t...@gmail.com
on 9 Apr 2011 at 1:07
I have a very early preliminary patch… just to make sure that I am on the
road that you would like taken…
This works for my test cases, however I am pretty confident that it won't work
for everything. I propose to keep the patch more or less as is, but add a few
more conditions to the query so that the optimization is only done when we are
sure that it is safe to do. So for example, to only do the remapping when there
is no group by clause.
Does that make sense to you?
Original comment by pwagl...@gmail.com
on 17 Apr 2011 at 9:26
Attachments:
Hi,
I'm sorry, but I will not apply the patch. It's too much magic for me.
Regards,
Thomas
Original comment by thomas.t...@gmail.com
on 11 Jun 2011 at 3:52
I am more than happy to say that there is too much magic involved… how would
you like to see it modified? Where should this be hooked in?
Original comment by pwagl...@gmail.com
on 30 Jun 2011 at 10:31
I don't really know yet... One problem is that cloning an object in this way is
problematic for maintenance (when adding a new field, it's very easy to forget
adding that to the clone constructor). Maybe the subquery could be cloned by
creating the SQL statement, and then parsing it?
Original comment by thomas.t...@gmail.com
on 22 Oct 2011 at 9:05
Original issue reported on code.google.com by
pwagl...@gmail.com
on 17 Mar 2011 at 8:31