FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.26k stars 216 forks source link

Improve performance of ALL / ANY [CORE3144] #3521

Open firebird-automations opened 14 years ago

firebird-automations commented 14 years ago

Submitted by: Nataniel (natanieljr)

I was discussing this approach with a teacher and we didn't find any references in literature but i think that this is a interesting implementation, hove you find usefull.

When using an ANY or ALL operator:

If the subquery don't have a reference to the external query then the subquery could be processed before the external query with the following rule:

If the operator is ANY then take only the minimum value else if the an ALL the take only the maximum.

After this the external query could be executed comparing it's result with a constant value optimizing the results.

Example:

This query:

select X.PRICE from PRODUCTS X where X>PRICE >= all (select Y.PRICE from PRODUCTS Y);

Could be executed like:

1 - Check if the subquery does not references a external table.

1\.1 \- Check the operator 

  1\.1\.1 \- If the operator is ALL then place a MAX in the projection field  \[select MAX\(        Y\.PRICE         \)  from PRODUCTS Y \]

  1\.1\.2 \- If the operator is ALL then place a MIN in the projection field   \[select MIN\(        Y\.PRICE         \)  from PRODUCTS Y \]

1\.2 \- Execute the query

1\.3 \- Replace the subquery operator for a constant value with the result of the subquery

       select 
           X\.PRICE
       from PRODUCTS X
       where X\>PRICE \>= \[HERE GOES THE VALUE\];

1\.4 \- Execute the external query\.

I've made some tests with a table with 30000 rows and the server executed the query into a single read instead of 30000 read.

firebird-automations commented 14 years ago
Modified by: Sean Leyne (seanleyne) summary: Increasing ALL / ANY performace =\> Improve performance of ALL / ANY
firebird-automations commented 14 years ago

Commented by: @dyemanov

I doubt it will work correctly with NULLs inside the subquery.

firebird-automations commented 14 years ago

Commented by: Nataniel (natanieljr)

If you wish to use the comment text you have typed (shown below), please copy it now. This text will be lost when you leave this screen.

Dmitry. I've made some tests, let me show you the results:

For the tests I created a table, used IBExpert data generator to create 10.000 random records for that table and manually set a record's price to NULL.

CREATE TABLE TEST \(
    ID INTEGER NOT NULL,
    PRICE NUMERIC\(15,2\)
\);
ALTER TABLE TEST ADD CONSTRAINT PK\_TEST PRIMARY KEY \(ID\);
CREATE INDEX TEST\_IDX1 ON TEST \(PRICE\);

I created a simple query to "get the id of the item with the biggest price"

1 \- For the X table was executed a sequential scan, for T was executed an indexed search\.

    select
        <http://T.id>
    from TEST T
    where T\.price \>= \(select max\(X\.PRICE\) from TEST X\)

    The resultset of this query was the id of the highest priced item\.

2 \- For both X and T the server performed a sequential scan\.

    select
        <http://T.id>
    from TEST T
    where T\.price \>= all\(select X\.PRICE from TEST X\)

    The resultset of this query was NULL\.

3 \- My proposed approach:

    3\.1 \- Identify a subquery with an ALL or ANY

        select
            <http://T.id>
        from TEST T
        where T\.price \>= \[ all\(select X\.PRICE from TEST X\) \]

    3\.2 \- Convert a subquery with ALL to a MAX and execute

        select max\(X\.PRICE\) from TEST X

    3\.3 \- Take the result and place as a constant into the external SQL and coninue the processing

        select
            <http://T.id>
        from TEST T
        where T\.price \>= \[ 9999\.99 \]

    The resultset of this query was the id of the highest priced item \(like on the first query\)

I can't say that the aprroach 2 ou 3 would be right, that depends of the point of view.

If you consider a set like { A, B, C, D, E, F } and A < B < C < D < E < F, instead of comparing against all itens you could just compare against the upper or lower bound.

firebird-automations commented 14 years ago

Commented by: @asfernandes

max(x.price) is incorrect formula as Dmitry said.

The correct one is: iif(count(iif(x.price is null, 1, 0)) = 0, max(x.price), null) or something similar.