FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.24k stars 212 forks source link

Simple indexed join query slows terrible down with useage of 'like' [CORE2267] #2693

Open firebird-automations opened 15 years ago

firebird-automations commented 15 years ago

Submitted by: Roy Damman (royd)

Problem discovered when trying to migrate a project from Firebird 1.5.x to 2.0 and after that to (the latest) version 2.1.1. Note: All execution times are obtained by executing the involved queries twice and take the second time, so there would not be any caching issues.

2 Tables: Table NUMMER: select count(*) from nummer: > 14000000

Table NAAM: select count(*) from naam: > 3700000

Query A: select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam like 'SEARCHEDNAME%' Result: 11 Execution time: 2.0188 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

Query B: select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam = 'SEARCHEDNAME' or even: select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam like 'SEARCHEDNAME' Result: 6 Execution time: 0.0031 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

In both cases the same plan is used. And I think both queries can use the same lookup method of the matching B.naam values because the start of the string ('SEARCHEDNAME') is the same. But the first query is about 650 times slower then the second query and has only a double count of records involved. This problem did not occur with Firebird 1.5.X but this behaviour makes Firebird 2.x not useable for this project. Also I tried * instead of count(*).

Results:

Query A: Execution time: 4.0437 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

Query B: Execution time: 0.0079 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

Again a huge difference: Query B is 512 times faster. I also tried the search with different names. All had the same result in execution time difference.

The Tables:

CREATE TABLE "NUMMER" ( "ID" INTEGER NOT NULL, "NUMMER" VARCHAR(12), "NAAMID" INTEGER, "POSTCODE" CHAR(8), "HUISNUMMER_N" INTEGER, PRIMARY KEY ("ID") );

CREATE INDEX "NUMMER_NAAMNUMMER" ON "NUMMER"("NAAMID", "NUMMER"); CREATE INDEX "NUMMER_NUMMER" ON "NUMMER"("NUMMER", "NAAMID"); CREATE INDEX "NUMMER_PCHUISNR" ON "NUMMER"("POSTCODE", "HUISNUMMER_N");

CREATE TABLE "NAAM" ( "NAAMID" INTEGER NOT NULL, "NAAM" VARCHAR(50), PRIMARY KEY ("NAAMID") );

CREATE INDEX "NAAM_NAAM" ON "NAAM"("NAAM");

firebird-automations commented 15 years ago

Commented by: @hvlad

Show us full execution statistics, please If possible, for both FB 1.5 and FB 2.1

firebird-automations commented 15 years ago

Commented by: Mercea Paul (m24paul)

If you always have LIKE '81%' you could use STARTING WITH '81'

select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam = 'SEARCHEDNAME'

firebird-automations commented 15 years ago

Commented by: Roy Damman (royd)

@ Vlad

Which execution statistics do you want and how do I obtain them?

Extra statistics for Firebird 2.1.1 with usage of * instead of count(*)

Query A: select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam like 'SEARCHEDNAME%'

Execution time: 00:00:04.0343 Prepare time: 00:00:00.0032 Starting Memory: 86508148 Current Memory: 86695300 Delta Memory: 187152 Number of Buffers: 20000 Reads: 1480 Writes: 31 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER)) Rows Affected: 11

Query B: select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam = 'SEARCHEDNAME'

Execution time: 00:00:00.0062 Prepare time: 00:00:00.0031 Starting Memory: 86508080 Current Memory: 86695192 Delta Memory: 187112 Number of Buffers: 20000 Reads: 1480 Writes: 39 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER)) Rows Affected: 6

@Mercea This is not the case. And it shouldn't make any difference if the optimizer was OK.

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam = 'SEARCHEDNAME' Execution time: 00:00:00.0062 Prepare time: 00:00:00.0016 Rows Affected: 6 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam LIKE 'SEARCHEDNAME%' Execution time: 00:00:04.0359 Prepare time: 00:00:00.0031 Rows Affected: 11 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam STARTING WITH 'SEARCHEDNAME' Execution time: 00:00:04.0359 (this is right, exact the same) Prepare time: 00:00:00.0032 Rows Affected: 11 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam STARTING WITH 'SEARCHEDNAME' Execution time: 00:00:04.0375 Prepare time: 00:00:00.0031 Rows Affected: 11 Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

firebird-automations commented 15 years ago

Commented by: @hvlad

Please, show also results and statistics for the following queries :

SELECT COUNT(*) FROM NAAM WHERE naam LIKE 'SEARCHEDNAME%' SELECT COUNT(*) FROM NAAM WHERE naam = 'SEARCHEDNAME' SELECT COUNT(*) FROM nummer WHERE POSTCODE LIKE '81%' and NAAMID = <any id from the query above>

firebird-automations commented 15 years ago

Commented by: @dyemanov

Number of fetches would also be useful, I suppose.

firebird-automations commented 15 years ago

Commented by: Roy Damman (royd)

SELECT COUNT(*) FROM NAAM WHERE naam LIKE 'SEARCHEDNAME%' COUNT: 91 Execution time: 00:00:00.0000 Prepare time: 00:00:00.0000 Starting Memory: 86328284 Current Memory: 86339364 Delta Memory: 11080 Number of Buffers: 20000 Reads: 313 Writes: 17 Plan: PLAN (NAAM INDEX (NAAM_NAAM)) Rows Affected: 1

SELECT COUNT(*) FROM NAAM WHERE naam = 'SEARCHEDNAME' COUNT: 1 Execution time: 00:00:00.0000 Prepare time: 00:00:00.0000 Starting Memory: 86329632 Current Memory: 86333200 Delta Memory: 3568 Number of Buffers: 20000 Reads: 313 Writes: 29 Plan: PLAN (NAAM INDEX (NAAM_NAAM)) Rows Affected: 1

SELECT COUNT(*) FROM nummer WHERE POSTCODE LIKE '81%' and NAAMID = 11667 (NAAMID FROM QUERY 2) COUNT: 6 Execution time: 00:00:00.0031 Prepare time: 00:00:00.0000 Starting Memory: 86344516 Current Memory: 86511124 Delta Memory: 166608 Number of Buffers: 20000 Reads: 992 Writes: 47 Plan: PLAN (NAAM INDEX (NAAM_NAAM)) Rows Affected: 1

I understand in which direction you are searching the explanation. So I changed SEARCHNAME in naam in UNIQUENAME so that the count of =UNIQUENAME and LIKE UNIQUENAME% would be the same. After that the statistics of QUERY A and QUERY B where in the same range. I will install FB 1.5.x once again and repeat the tests above.