etcd keys count explosion with contiuous rolling-retention snapshots

khanhngobackend commented 2 years ago

Dear team, first of all, I'm hugely grateful for your wonderful work on Linstor storage solution. We are currently using it in tiny-sized production for months with very good experience so far. But we has been hit with an issue described below only after utilizing its snapshoting feature on a new bigger cluster.

Our K8s cluster is on-prem, using Linstor (deployed via Piraeus Operator) as the main persistent storage solution. Our cluster have 12 nodes, each have a single ZFS pool serving all storage requests.

The above graphs depict continuously growing number of etcd keys (along with values and hence growing db size) almost at the same pace for about ~1 week now. The etcd database in the graph is dedicated to Linstor only, no other services have access to it so Linstor is the sole creator of those data.

We suspect having large number of CSI Snapshots creations/deletions contribute to this phenomenon as our setup currently have multiple PVCs (around 30), each scheduled with hourly snapshots and rolling retention (with the help of https://github.com/FairwindsOps/gemini). The snapshots schedule of all of those PVC are similar:

apiVersion: gemini.fairwinds.com/v1beta1
kind: SnapshotGroup
metadata:
  name: &pvc-name data-storage-base-chi-<autofilled>-i-0-0-0
spec:
  template:
    spec:
      volumeSnapshotClassName: k-prod
  persistentVolumeClaim:
    claimName: *pvc-name
  schedule:
  - every: hour
    keep: 4 # not counting the latest (i.e. need to +1)
  - every: 6 hour
    keep: 3 # not counting the latest (i.e. need to +1)
  - every: 1 day
    keep: 6 # not counting the latest (i.e. need to +1)
  - every: 15 day
    keep: 1 # not counting the latest (i.e. need to +1)
  - every: 60 day
    keep: 0 # not counting the latest (i.e. need to +1)

FYI, some additional debugging info (all gather at roughly the same time, latest as of this writing) of our K8s-cluster:

Total PVC:

> kubectl  get pvc -o name -A | wc -l
33 # some (about 6) are not backed by Listor

Total PV:

> kubectl get pv -o name | wc -l
33 # some (about 6) are not backed by Listor

Total VolumeSnapshot:

> kubectl get volumesnapshot -o name -A | wc -l
156

Total VolumeSnapshotContent:

> kubectl get volumesnapshotcontent -o name | wc -l
156

Total Linstor-managed Volumes:

> linstor v l | wc -l
27 # already subtracted 4 lines of headers and table borders

Total Linstor-managed Snapshots:

> linstor s l | wc -l
156  # already subtracted 4 lines of headers and table borders

ETCD keys prefix count (sorted):

      1 /LINSTOR/DBHISTORY
      1 SPACE_HISTORY/2022-11-02
      1 SPACE_HISTORY/2022-11-03
      1 SPACE_HISTORY/2022-11-04
      1 SPACE_HISTORY/2022-11-05
      1 SPACE_HISTORY/2022-11-06
      1 SPACE_HISTORY/2022-11-07
      1 TRACKING_DATE/
      2 /LINSTOR/VOLUME_GROUPS
      4 /LINSTOR/SEC_CONFIGURATION
      4 /LINSTOR/SEC_DFLT_ROLES
      4 /LINSTOR/SEC_ID_ROLE_MAP
      8 /LINSTOR/SEC_ACCESS_TYPES
      8 /LINSTOR/SEC_IDENTITIES
      8 /LINSTOR/STOR_POOL_DEFINITIONS
     15 /LINSTOR/SEC_TYPES
     16 /LINSTOR/RESOURCE_GROUPS
     20 /LINSTOR/SEC_ROLES
     56 /LINSTOR/NODES
     60 /LINSTOR/SEC_TYPE_RULES
     70 /LINSTOR/NODE_NET_INTERFACES
    120 /LINSTOR/NODE_STOR_POOL
    366 /LINSTOR/VOLUMES
    549 /LINSTOR/LAYER_STORAGE_VOLUMES
    549 /LINSTOR/RESOURCES
    549 /LINSTOR/VOLUME_DEFINITIONS
   1071 /LINSTOR/LAYER_RESOURCE_IDS
   1098 /LINSTOR/RESOURCE_DEFINITIONS
   1695 /LINSTOR/SEC_ACL_MAP
   1863 /LINSTOR/PROPS_CONTAINERS
   4922 /LINSTOR/SEC_OBJECT_PROTECTION

ghernadi commented 2 years ago

Hello, if you happen to have an older linstor version than 1.20, I assume your issue is related to https://github.com/LINBIT/linstor-server/issues/311. There should also be a non-rc release in the meantime for piraeus as well: https://github.com/piraeusdatastore/piraeus-operator/releases/tag/v1.10.0

khanhngobackend commented 2 years ago

We encountered this issue while using Linstor 1.20 and Piraeus Operator 1.10, which I believe are both latest. We encountered the issue #311 before, our team decided it's not worth the risk of dragging K8s control-plane down in cases like that (lots of CRs gets generated hence increasing the load on the APIServer) and also the overhead of storing data via APIServer is huge compared to directly to DB so we switch to standalone etcd DB.

khanhngobackend commented 2 years ago

Another day passed and it seems it will keep increasing roughly about 1k keys per-day until we ran out of memory allocated to etcd instances.

ghernadi commented 2 years ago

Would you mind sending me a dump of the database? (mail is in my profile)

If that is not possible, can you tell me more about your setup? Do you for example have auto-snapshot or something alike active? Are old snapshots also automatically deleted? etc...

khanhngobackend commented 2 years ago

We have auto-snapshot running every hour, with retention policy (supported by Gemini) set as follow:

apiVersion: gemini.fairwinds.com/v1beta1
kind: SnapshotGroup
metadata:
  name: &pvc-name data-storage-base-chi-<autofilled>-i-0-0-0
spec:
  template:
    spec:
      volumeSnapshotClassName: k-prod
  persistentVolumeClaim:
    claimName: *pvc-name
  schedule:
  - every: hour
    keep: 4 # not counting the latest (i.e. need to +1)
  - every: 6 hour
    keep: 3 # not counting the latest (i.e. need to +1)
  - every: 1 day
    keep: 6 # not counting the latest (i.e. need to +1)
  - every: 15 day
    keep: 1 # not counting the latest (i.e. need to +1)
  - every: 60 day
    keep: 0 # not counting the latest (i.e. need to +1)

Snapshots past retention period are auto-deleted. I can confirm that expired snapshots are correctly remove from both K8s and Linstor. Only metadata stored in etcd keep on growing.

khanhngobackend commented 2 years ago

I did some digging into etcd and could zoom in to some problematic (keep on growing without bounds) key prefixes:

   4680 /LINSTOR/SEC_OBJECT_PROTECTION//snapshotdefinitions
    780 /LINSTOR/PROPS_CONTAINERS//snapshotdefinitions
   1584 /LINSTOR/SEC_ACL_MAP//snapshotdefinitions

The numbers before each key-prefixes are count of all keys with the same prefix in etcd. Here is the keys-only dump of etcd https://gist.github.com/khanhngobackend/0f2c6ea1ac9e85a54df03d560874f9ff

khanhngobackend commented 2 years ago

@ghernadi , is the keys-only dump above enough? If not, here is the snapshot of the db (taken on a different day): linstor-etcd-snapshot.zip

ghernadi commented 2 years ago

I guess it is enough, I think I know what the issue here is. If I am not mistaken this is not exactly the same as https://github.com/LINBIT/linstor-server/issues/311 but quite similar. I am investigating further to see if I can find other similar issues. I will update here if with more news

khanhngobackend commented 1 year ago

Hi @ghernadi, any progress on this?

rp- commented 1 year ago

This should be fixed in the latest release 1.20.2

khanhngobackend commented 1 year ago

Thanks @rp- , I'll try 1.20.2 to see if the problem fixed.

khanhngobackend commented 1 year ago

I can confirm that the problem is fixed! Thanks a lot team!

LINBIT / linstor-server

etcd keys count explosion with contiuous rolling-retention snapshots #324