gcpug / dogrun

Apache License 2.0
0 stars 0 forks source link

How should a secondary index be designed? #19

Closed sinmetal closed 6 years ago

sinmetal commented 6 years ago

セカンダリインデックスはどのように設計するのがよいだろうか?

WHAT

Set Shard and perform a full scan if you do a simple query

CREATE INDEX Order1MShardCreatedAtDesc
  ON Order1M (ShardCreatedAt, CreatedAt DESC)

SELECT * 
FROM Order1M@{FORCE_INDEX=Order1MShardCreatedAtDesc} 
WHERE ShardCreatedAt BETWEEN 0 AND 9
ORDER BY CreatedAt DESC 
LIMIT 10

image

It is quick if you acquire LIMIT by shard for each shard and rearrange

SELECT c.*
FROM (
  SELECT 
    ARRAY(
      SELECT AS STRUCT *
      FROM Order1M@{FORCE_INDEX=Order1MShardCreatedAtDesc}
      WHERE ShardCreatedAt = a.ShardCreatedAt
      ORDER BY CreatedAt DESC LIMIT 10
    ) AS ar
  FROM Order1M@{FORCE_INDEX=Order1MShardCreatedAtDesc} a
  WHERE ShardCreatedAt BETWEEN 0 AND 9
  GROUP BY ShardCreatedAt
) b, UNNEST(ar) c
ORDER BY c.CreatedAt DESC
LIMIT 10

image

WHY

Refs

https://cloudplatform.googleblog.com/2018/05/Sharding-of-timestamp-ordered-data-in-Cloud-Spanner.html

sinmetal commented 6 years ago

Q:セカンダリが多いとinsert/delete遅くなる? A:貼る時に時間はかかるけど、多いからと言って遅くなるかというとそういうわけではない。

sinmetal commented 6 years ago

GCPUG ShonanでRobertに聞いたら、以下のように言ってた。

INDEXが0 か 1以上でCPU利用率とPerformanceは変わる。そこから100個ぐらいまでは変わらないだろう。1000ぐらいまでいくと変わるかもしれない。
Paxos Groupの数で変わるので、CPU利用率やPerformanceが変わるのは、階段状になる