google-code-export / h2database

Automatically exported from code.google.com/p/h2database
0 stars 1 forks source link

Slow down in IN (SELECT...) #276

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Before submitting a bug, please check the FAQ:
http://www.h2database.com/html/faq.html

What steps will reproduce the problem?
(simple SQL scripts or simple standalone applications are preferred)
1. Download attach file (db and test program included)
2. Run the test program that execute the query against the db (twice)

What is the expected output? What do you see instead?
The elapse time is 12 seconds for the very simple query.

What version of the product are you using? On what operating system, file
system, and virtual machine?
Tried both h2-1.2.147 and svn trunk at Jan 16, 2011. Ms Windows 7 on AMD x2 
3.0ghz. 

Do you know a workaround?
I profiled it seems have something to do with getValue() in 
ConditionInSelect.java - which is not optimized. E.g for the statement in 
question
Select ... from ..ID in (select distinct ID ...), the ID would be compared to 
every single ID in the subquery. I implement a hack to put the subquery result 
in a hashset and just use contains check, the query sped up a lot (less than 
half sec). 
Strangely if i launch console and execute the query, the query seems to be fast 
without the hack. Not sure what's the difference between console and embedded 
that cause it.

How important/urgent is the problem for you?
Not urgent. Not too important i can switch db engine but embedded h2 is 
certainly nice to have.

In your view, is this a defect or a feature request?
Defect

Please provide any additional information below.

Original issue reported on code.google.com by thlee4...@gmail.com on 17 Jan 2011 at 7:06

Attachments:

GoogleCodeExporter commented 9 years ago
The problem should be fixed in version 1.3.150. At least it doesn't behave 
O(n^2) any more. It's still a bit slow, but I think it's acceptable now.

Original comment by thomas.t...@gmail.com on 28 Jan 2011 at 7:20

GoogleCodeExporter commented 9 years ago
I looked at the fix and tried it. Seems still slow to me. 
I check the code in ConditionInSelect.java again. Here's my thought:

The query is something like 
SELECT * FROM TEST_ATTR WHERE ID IN (SELECT abc...)

And this is the corresponding code in ConditionInSelect

public Value getValue(Session session) {
 query.setSession(session);
        LocalResult rows = query.query(0);
        session.addTemporaryResult(rows);
....
}

Looks like for every ID in outer query (ID in..), it will do the subquery 
(select abc...). Shouldn't the subquery be cached so it won't redo the same 
query again and again? I put a hack in the new ConditionInSelect.java to try 
and with the cached result it is a lot faster. Its something like this:

public Value getValue(Session session) {
        query.setSession(session);
        LocalResult rows = null;
        if (_cachedResult==null){
            _cachedResult = query.query(0);
        } 
        rows = _cachedResult;
        ...
}

Original comment by thlee4...@gmail.com on 29 Jan 2011 at 7:33

GoogleCodeExporter commented 9 years ago
Re-opened

Original comment by thomas.t...@gmail.com on 29 Jan 2011 at 3:58

GoogleCodeExporter commented 9 years ago
Actually, the inner query is only run once. Query.query(int, ResultSetTarget) 
should (and does, in my case) return the cached result set from then on:

http://www.google.com/codesearch/p?hl=en#nNQkx37hnf0/trunk/h2/src/main/org/h2/co
mmand/dml/Query.java&q=Query%20package:http://h2database%5C.googlecode%5C.com&l=
250

However, I agree it's still somewhat slow. The reason is that for each row, a 
lookup is made in the result set from the inner query. This lookup is now O(1) 
or at least O(log n) and no longer O(n), therefore it scales much better. 
However, it's still relatively slow. One solution would be to avoid this check, 
however the optimizer currently can't do that currently.

I'm afraid I will not have time in the near future to improve performance for 
this case, sorry. If you want to give it a try, please go ahead - patches are 
always welcome.

Original comment by thomas.t...@gmail.com on 29 Jan 2011 at 4:09

GoogleCodeExporter commented 9 years ago
The biggest problem has been fixed in version 1.3.151. I will currently leave 
it at that; if the current speed is not sufficient please tell me your exact 
use case (for example, why you can't use an inner join).

Original comment by thomas.t...@gmail.com on 12 Feb 2011 at 2:50