Unique index 如何證明效能比 non Unique index 佳?

isdaniel commented 2 years ago

理論上 Unique index 會比 non Unique index 效能更加，因為可以提供 query optimizer 此資料不會重複並提供更好的計畫

但經過我一些嘗試並查看執行計畫並無法證明 Unique index 如何證明效能比 non Unique index 佳?

CREATE TABLE T3(
    ID INT NOT NULL,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

CREATE INDEX IX_ID_T3 ON T3 (ID);
CREATE UNIQUE INDEX UIX_ID_T3 ON T3 (ID);

INSERT INTO T3
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

vacuum ANALYZE T3;

--drop index ix_id_td 

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT DISTINCT a1.ID
FROM T3 a1 INNER JOIN T3 a2
ON a1.id = a2.id
WHERE a1.id <= 300000

我嘗試了許多查詢例如 DISTINCT, JOIN (跑 merge join) 照執行結果 Page read 跟執行指標 Unique index 和 non Unique index 都沒有差異

執行計畫結果

-- UIX_ID_T3 
"Unique  (cost=0.85..41457.94 rows=298372 width=4) (actual time=0.030..267.207 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40712.01 rows=298372 width=4) (actual time=0.030..200.412 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using uix_id_t3 on t3 a1  (cost=0.42..8501.93 rows=298372 width=4) (actual time=0.017..49.237 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using uix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.010..40.170 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.171 ms"
"Execution Time: 282.919 ms"

---IX_ID_T3 
"Unique  (cost=0.85..41420.43 rows=297587 width=4) (actual time=0.027..230.256 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40676.46 rows=297587 width=4) (actual time=0.027..173.308 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using ix_id_t3 on t3 a1  (cost=0.42..8476.20 rows=297587 width=4) (actual time=0.015..41.606 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using ix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.009..34.019 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.195 ms"
"Execution Time: 243.711 ms"

如上 UIX_ID_T3 是 unique index 我反覆跑了好幾次 non Unique index 跟 unique index 基本沒有差異

我有查找許多文章都只有說如果資料是 Unique 建立 Unique 對於 query optimizer 會比較好但都沒有相關執行計畫證明 non Unique index 跟 Unique index 執行計畫差異

查詢其中一個連結

https://stackoverflow.com/questions/1293499/are-unique-indexes-better-for-column-search-performance-pgsql-mysql

想請問是否有大大能用語法跟執行計畫證明 Unique index 如何證明效能比 non Unique index 佳?

digoal commented 2 years ago

内部没有本质区别吧, 由于索引内没有版本信息, 无法直接通过索引判断可见性. 需要回表或者通过vm减少回表. 如果都是btree, unique index和normal index, 没啥效率上的区别.

isdaniel commented 2 years ago

是的經過我的測試和查找相關文章 unique Index 在某些情境能幫助 QO 建立較優執行計畫，在 postgres 中並沒有像在 sql-server 那樣會有 many to many 優化的事情感謝德哥回答我先把 issue 關閉了

digoal / blog

Unique index 如何證明效能比 non Unique index 佳? #106