FirebirdSQL / firebird

Firebird server, client and tools
https://firebirdsql.org
1.26k stars 217 forks source link

Avoid index lookup for a NULL key if the condition is known to always be FALSE in this case #8278

Open AnatolyLoy opened 1 month ago

AnatolyLoy commented 1 month ago

The case contains three linked tables (see script below). The simple Query No1 and tricked Query No2 produced the same result, but Query No1 slowed it.

It looks like Firebird compares the NULL value of T1.T1_ID with every index value and spend resources...

CREATE TABLE T1 (T1_ID INTEGER NOT NULL, PRIMARY KEY (T1_ID)); CREATE TABLE T2 (T2_ID INTEGER NOT NULL, T1_ID INTEGER, PRIMARY KEY (T2_ID)); ALTER TABLE T2 ADD FOREIGN KEY (T1_ID) REFERENCES T1 (T1_ID); CREATE TABLE T3 (T3_ID INTEGER NOT NULL, T1_ID INTEGER, PRIMARY KEY (T3_ID)); ALTER TABLE T3 ADD FOREIGN KEY (T1_ID) REFERENCES T1 (T1_ID); -- Fill data: -- T1 - 1 000 recorsa -- T2 - 10 000 records, 1 000 of them referred to T1 -- T3 - 1 000 000 records, 1 000 of them referred to T1 EXECUTE BLOCK AS DECLARE VARIABLE L_T1_ID INTEGER; DECLARE VARIABLE L_T2_ID INTEGER; DECLARE VARIABLE L_T3_ID INTEGER; BEGIN L_T3_ID = 1; WHILE (L_T3_ID <= 1000000) DO BEGIN L_T1_ID = IIF(MOD(L_T3_ID, 1000) = 0, TRUNC(L_T3_ID/1000), NULL); L_T2_ID = IIF(MOD(L_T3_ID, 100) = 0, TRUNC(L_T3_ID/100), NULL); IF (L_T1_ID IS NOT NULL) THEN INSERT INTO T1 (T1_ID) VALUES (:L_T1_ID); INSERT INTO T3 (T3_ID, T1_ID) VALUES (:L_T3_ID, :L_T1_ID); IF (L_T2_ID IS NOT NULL) THEN INSERT INTO T2 (T2_ID, T1_ID) VALUES (:L_T2_ID, :L_T1_ID); L_T3_ID = L_T3_ID + 1; END END;

-- Query No1: Simple query and bad performance SELECT count(*) FROM t2 LEFT OUTER JOIN T1 ON T1.T1_ID = T2.T1_ID LEFT OUTER JOIN T3 ON T3.T1_ID = T1.T1_ID

Plan PLAN JOIN (JOIN (T2 NATURAL, T1 INDEX (RDB$PRIMARY294)), T3 INDEX (RDB$FOREIGN298))

Adapted Plan PLAN JOIN (JOIN (T2 NATURAL, T1 INDEX (INTEG_2830)), T3 INDEX (INTEG_2836))

------ Performance info ------ Prepare time = 0ms Execute time = 44s 797ms Avg fetch time = 44,797.00 ms Current memory = 40,174,016 Max memory = 74,196,768 Memory buffers = 2,048 Reads from disk to cache = 1,251 Writes from cache to disk = 0 Fetches from cache = 4,059,140

-- Query No2: Trick query and expected performance SELECT count(*) FROM T2 LEFT OUTER JOIN T1 on T1.T1_ID = T2.T1_ID LEFT OUTER JOIN T3 on T3.T1_ID = COALESCE(T1.T1_ID, -1)

Plan PLAN JOIN (JOIN (T2 NATURAL, T1 INDEX (RDB$PRIMARY294)), T3 INDEX (RDB$FOREIGN298))

Adapted Plan PLAN JOIN (JOIN (T2 NATURAL, T1 INDEX (INTEG_2830)), T3 INDEX (INTEG_2836))

------ Performance info ------ Prepare time = 0ms Execute time = 31ms Avg fetch time = 31.00 ms Current memory = 37,804,784 Max memory = 38,739,168 Memory buffers = 2,048 Reads from disk to cache = 1,042 Writes from cache to disk = 0 Fetches from cache = 63,145

livius2 commented 1 month ago

This is a typical case, and I suppose the NULL comparison should be removed by the engine here, as this is a simple scenario. In more complex cases, however, it might be harder to detect such situations. Comparisons with NULL should be skipped, as they cannot return any rows. As a result, scanning the index for a NULL value is unnecessary.

AnatolyLoy commented 1 month ago

This is a typical case, and I suppose the NULL comparison should be removed by the engine here, as this is a simple scenario. In more complex cases, however, it might be harder to detect such situations. Comparisons with NULL should be skipped, as they cannot return any rows. As a result, scanning the index for a NULL value is unnecessary.

Exactly. It's unfortunate. The simple query for the case mentioned above produces the same result - The OUTER JOIN compares NULLs 9,000 times with all 1,000,000 index values... So, it's a spend server resource whenever you try to left-join the non-mandatory column with many NULL values, Just a full index scan... Every comparison. Brrrrrrr.....

SELECT COUNT() FROM T2 -- 10 000 records LEFT OUTER JOIN T3 ON T3.T1_ID = NULL -- 1 000 000 index scans

Plan

PLAN JOIN (T2 NATURAL, T3 INDEX (RDB$FOREIGN300))

Adapted Plan

PLAN JOIN (T2 NATURAL, T3 INDEX (INTEG_2846))

------ Performance info ------ Prepare time = 15ms Execute time = 49s 625ms Avg fetch time = 49,625.00 ms Current memory = 56,801,616 Max memory = 57,807,920 Memory buffers = 3,000 Reads from disk to cache = 0 Writes from cache to disk = 0 Fetches from cache = 4,480,148

AnatolyLoy commented 1 month ago

The Firebird 2.5 makes the same result. The INNER JOIN makes the same result. SELECT COUNT() FROM T2 -- 10 000 records JOIN T3 ON T3.T1_ID = NULL -- 1 000 000 index scans

Even do not try to run

SELECT COUNT() FROM T2 -- 10 000 index scans JOIN T3 ON T3.T1_ID = T2.T1_ID -- 1 000 000 records

The server has chosen the T3 -> T2 plan. and we will have about 1 000 000 attempts to make scans for 1000 index values :(

So, it looks like a non-optimal execution in IndexTableScan...