SELECT with simple OR tests uses tableScan when it could use indexes

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. import the attached script using the RUNSCRIPT command (It was created using 
'SCRIPT TO countries.zip COMPRESSION ZIP')

NOTE: It is too big to attach; I can use something like yousendit.com when you 
are ready to examine this issue.  Just post a comment and  I will post the 
script.

2. Run ANALYZE
3. Run 'SELECT name FROM all_countries WHERE featurecode='PCLI' OR 
admincode1='CA';
4. Wait 15 seconds
5. Get result set

What is the expected output? What do you see instead?

The output is fine, the speed is not.  EXPLAIN shows why it is so slow:

SELECT NAME
FROM PUBLIC.ALL_COUNTRIES /* PUBLIC.ALL_COUNTRIES.tableScan */
WHERE (FEATURECODE = 'PCLI') OR (ADMINCODE1 = 'CA')

It defaults to a table scan even though there are indexes available for both 
columns.  It has 2.8m rows.

What version of the product are you using? On what operating system, file
system, and virtual machine?
H2 1.2.142, JDK 5, and 6; Window XP, 2003, and Mac OS X 10.6

Do you know a workaround?

use UNION ALL:

SELECT name FROM all_countries WHERE featurecode='PCLI' 
UNION ALL
SELECT name FROM all_countries WHERE admincode1='CA' 

same 1000 rows, but in 800ms.  This EXPLAIN shows:

(SELECT NAME
FROM PUBLIC.ALL_COUNTRIES /* PUBLIC.IDX_FEATURECODE: FEATURECODE = 'PCLI' */
WHERE FEATURECODE = 'PCLI') UNION ALL (SELECT NAME
FROM PUBLIC.ALL_COUNTRIES /* PUBLIC.IDX_ADMINCODE1: ADMINCODE1 = 'CA' */
WHERE ADMINCODE1 = 'CA')

How important/urgent is the problem for you?

Pretty urgent, as the workaround is ugly, even difficule if one uses an ORM (as 
most people do).

In your view, is this a defect or a feature request?

Defect.

Please provide any additional information below.

I know this is on the roadmap, but I am filing an issue to raise the visibility 
of it and give people a chance to voice their interest.

Original issue reported on code.google.com by mbis...@gmail.com on 10 Sep 2010 at 10:07

GoogleCodeExporter commented 9 years ago

This is a known problem. There is an optimization for the case "columnA=x or 
columnA=y", but it doesn't work in this case because your query uses different 
columns and therefore different indexes. I wouldn't call it 'defect', it's just 
not implemented.

Original comment by thomas.t...@gmail.com on 18 Sep 2010 at 3:15

GoogleCodeExporter commented 9 years ago

This issue is in the roadmap at http://www.h2database.com/html/roadmap.html - 
priority is tracked there.

Original comment by thomas.t...@gmail.com on 28 Jan 2011 at 7:41

Changed state: InRoadmap

GoogleCodeExporter commented 9 years ago

i stumbled over this one too.

also found it mentioned by thomas in 2008: 
http://groups.google.com/group/h2-database/browse_thread/thread/7feced8a47eb688e
/fa39f64cd30a5cf9?lnk=gst&q=union#fa39f64cd30a5cf9

the workaround is documented on http://www.h2database.com/html/performance.html 
section "Using Multiple Indexes"

moving from mysql to h2, this is quite a surprise and culprit for me. quite 
some code to change, and the affected sql statements aren't even easy to find.
"luckily" the performance without union is so bad that the missed cases will 
show up at runtime. 

i would appreciate this feature. i'm surprised there is not more interest in 
this.

Original comment by andrej....@gmail.com on 22 Dec 2011 at 3:10

GoogleCodeExporter commented 9 years ago

here's another duplicate: 
http://code.google.com/p/h2database/issues/detail?id=345

Original comment by andrej....@gmail.com on 22 Dec 2011 at 4:00

google-code-export / h2database

SELECT with simple OR tests uses tableScan when it could use indexes #229