cohenjo / pg_idx_advisor

A PostgreSQL extension to analyze queries and give indexing advice.
50 stars 8 forks source link

getting index recommendations for target list. #2

Open JerrySievert opened 9 years ago

JerrySievert commented 9 years ago

i'm getting an index creation warning, and no results on what should be an easy index:

WARNING:  Failed to create index advice for: explain select max(unitsales) from measurement where logdate = '2006-03-01';

here are the tables:

test=# \d+ measurement
                       Table "public.measurement"
  Column   |  Type   | Modifiers | Storage | Stats target | Description
-----------+---------+-----------+---------+--------------+-------------
 city_id   | integer | not null  | plain   |              |
 logdate   | date    | not null  | plain   |              |
 peaktemp  | integer |           | plain   |              |
 unitsales | integer |           | plain   |              |
Triggers:
    insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger()
Child tables: measurement_y2006m02,
              measurement_y2006m03
Has OIDs: no

trigger:

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.logdate >= DATE '2006-02-01' AND
         NEW.logdate < DATE '2006-03-01' ) THEN
        INSERT INTO measurement_y2006m02 VALUES (NEW.*);
    ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
            NEW.logdate < DATE '2006-04-01' ) THEN
        INSERT INTO measurement_y2006m03 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';
    END IF;
    RETURN NULL;
END;
$$

tables:

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
);

CREATE TABLE measurement_y2006m02 (
    CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2006m03 (
    CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);

CREATE TRIGGER insert_measurement_trigger
    BEFORE INSERT ON measurement
    FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
explain select max(unitsales) from measurement;
                                     QUERY PLAN

--------------------------------------------------------------------------------
----
 Aggregate  (cost=33.16..33.17 rows=1 width=4)
   ->  Append  (cost=0.00..28.73 rows=1774 width=4)
         ->  Seq Scan on measurement  (cost=0.00..0.00 rows=1 width=4)
         ->  Seq Scan on measurement_y2006m02  (cost=0.00..27.70 rows=1770 width
=4)
         ->  Seq Scan on measurement_y2006m03  (cost=0.00..1.03 rows=3 width=4)

 ** Plan with hypothetical indexes **
(7 rows)

creating indexes:

test=# create index idx_measurement_unitsales_y2006m02 ON measurement_y2006m02 (unitsales);
CREATE INDEX
test=# create index idx_measurement_unitsales_y2006m03 ON measurement_y2006m03 (unitsales);
CREATE INDEX

and finally new plan:

test=# explain select max(unitsales) from measurement;                                                                     QUERY PLAN

--------------------------------------------------------------------------------
---------------------------------------------------------------------
 Result  (cost=0.38..0.39 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.32..0.38 rows=1 width=4)
           ->  Merge Append  (cost=0.32..115.58 rows=1765 width=4)
                 Sort Key: measurement.unitsales
                 ->  Sort  (cost=0.01..0.02 rows=1 width=4)
                       Sort Key: measurement.unitsales
                       ->  Seq Scan on measurement  (cost=0.00..0.00 rows=1 widt
h=4)
                             Filter: (unitsales IS NOT NULL)
                 ->  Index Only Scan Backward using idx_measurement_unitsales_y2
006m02 on measurement_y2006m02  (cost=0.15..74.97 rows=1761 width=4)
                       Index Cond: (unitsales IS NOT NULL)
                 ->  Index Only Scan Backward using idx_measurement_unitsales_y2
006m03 on measurement_y2006m03  (cost=0.13..8.18 rows=3 width=4)
                       Index Cond: (unitsales IS NOT NULL)

 ** Plan with hypothetical indexes **
(15 rows)
JerrySievert commented 9 years ago

(this is the example partition creating query in postgres, so pretty basic)

cohenjo commented 9 years ago

Hi Jerry, There's actually 2 issues that came into play here. The first (and simpler one) is that I didn't consider indexes that only address the target list, this is easily fixed - I've patched it and will push once I finish running some tests.

The 2nd item, which is more complex, is extracting good index candidates from the target list and getting the selectivity and costs right - this is what I'm working now. It did suggest this index internally but decided against it due to high costs (it failed to "understand" that it will only need the first line)

By the way, did you run create extension pg_idx_advisor; ? it should have suggested an index on the m03 partition: create index on measurement_y2006m03(logdate)

Regards,

JerrySievert commented 9 years ago

i did not use create index pg_idx_advisor, as it did not work for me - instead i used load 'pg_idx_advisor'.

it made no suggestions, just the boilerplate.

cohenjo commented 9 years ago

Oh, I think you might be missing the advisory table. create extension pg_idx_advisor; essentially runs https://github.com/cohenjo/pg_idx_advisor/blob/master/sql/pg_idx_advisor--0.1.1--0.1.2.sql

create table index_advisory( reloid  oid,
    attrs       integer[],
    benefit     real,
    index_size  integer,
    backend_pid integer,
    timestamp   timestamptz,    
    indcollation int[], -- oidvector
    indclass    int[],
    indoption   int[],
    indexprs    text,
    indpred     text,
    query       text,
    recommendation text);

create index IA_reloid on index_advisory( reloid );
create index IA_backend_pid on index_advisory( backend_pid );

Could you check that this table exists and create it if not?

JerrySievert commented 9 years ago

the tables were not there. tried create extension again, and they are now there.

JerrySievert commented 9 years ago

unfortunately, with create extension, the actual .so doesn't appear to be loading, and thus the explain doesn't work.

JerrySievert commented 9 years ago

here you can see it in action:

test=# create extension pg_idx_advisor;
CREATE EXTENSION
test=# explain select max(unitsales) from measurement;
                                     QUERY PLAN

--------------------------------------------------------------------------------
----
 Aggregate  (cost=33.16..33.17 rows=1 width=4)
   ->  Append  (cost=0.00..28.73 rows=1774 width=4)
         ->  Seq Scan on measurement  (cost=0.00..0.00 rows=1 width=4)
         ->  Seq Scan on measurement_y2006m02  (cost=0.00..27.70 rows=1770 width
=4)
         ->  Seq Scan on measurement_y2006m03  (cost=0.00..1.03 rows=3 width=4)
(5 rows)

test=# load 'pg_idx_advisor';
NOTICE:  IND ADV: plugin loaded
LOAD
test=# explain select max(unitsales) from measurement;
                                     QUERY PLAN

--------------------------------------------------------------------------------
----
 Aggregate  (cost=33.16..33.17 rows=1 width=4)
   ->  Append  (cost=0.00..28.73 rows=1774 width=4)
         ->  Seq Scan on measurement  (cost=0.00..0.00 rows=1 width=4)
         ->  Seq Scan on measurement_y2006m02  (cost=0.00..27.70 rows=1770 width
=4)
         ->  Seq Scan on measurement_y2006m03  (cost=0.00..1.03 rows=3 width=4)

 ** Plan with hypothetical indexes **
(7 rows)
JerrySievert commented 9 years ago

and for posterity:

test=# \d
               List of relations
 Schema |         Name         | Type  | Owner
--------+----------------------+-------+-------
 public | index_advisory       | table | jerry
 public | measurement          | table | jerry
 public | measurement_y2006m02 | table | jerry
 public | measurement_y2006m03 | table | jerry
 public | test                 | table | jerry
(5 rows)
cohenjo commented 9 years ago

Hi, quick update, I'm now able to get a recommendation for this on both parent and child tables. I'll Add more testing and push shortly.

postgres=# explain select max(unitsales) from measurement;
INFO:
** Plan with Original indexes **

                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=37.98..37.99 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=37.94..37.98 rows=1 width=4)
           ->  Merge Append  (cost=37.94..186.37 rows=3683 width=4)
                 Sort Key: measurement.unitsales DESC
                 ->  Sort  (cost=0.01..0.02 rows=1 width=4)
                       Sort Key: measurement.unitsales DESC
                       ->  Seq Scan on measurement  (cost=0.00..0.00 rows=1 width=4)
                             Filter: (unitsales IS NOT NULL)
                 ->  Index Only Scan Backward using idx_measurement_unitsales_y2006m02 on measurement_y2006m02  (cost=0.15..76.37 rows=1841 width=4)
                       Index Cond: (unitsales IS NOT NULL)
                 ->  Sort  (cost=37.71..42.31 rows=1841 width=4)
                       Sort Key: measurement_y2006m03.unitsales DESC
                       ->  Seq Scan on measurement_y2006m03  (cost=0.00..28.50 rows=1841 width=4)
                             Filter: (unitsales IS NOT NULL)

 ** Plan with hypothetical indexes **
 read only, advice, index: create index on measurement(unitsales)
 read only, advice, index: create index on measurement_y2006m03(unitsales)
 Result  (cost=37.84..37.85 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=37.80..37.84 rows=1 width=4)
           ->  Merge Append  (cost=37.80..194.24 rows=3683 width=4)
                 Sort Key: measurement.unitsales DESC
                 ->  Index Only Scan Backward using <V-Index>:131216 on measurement  (cost=0.00..8.02 rows=1 width=4)
                       Index Cond: (unitsales IS NOT NULL)
                 ->  Sort  (cost=37.71..42.31 rows=1841 width=4)
                       Sort Key: measurement_y2006m02.unitsales DESC
                       ->  Seq Scan on measurement_y2006m02  (cost=0.00..28.50 rows=1841 width=4)
                             Filter: (unitsales IS NOT NULL)
                 ->  Index Only Scan Backward using <V-Index>:131217 on measurement_y2006m03  (cost=0.03..76.25 rows=1841 width=4)
                       Index Cond: (unitsales IS NOT NULL)
(32 rows)
cohenjo commented 9 years ago

Hi Jerry, I pushed a fix for this - I'd appreciate if you could comment if it resolved the issue for you as well.