Closed qsyqian closed 2 months ago
The second test.
Before test, compact & defrag to set the cluster to origin state.
Then send the same requests (qps and kv size) , snapshot to one etcd member. Make sure the snapshot process doesn't contain the compact operations. This is the only different.
Then see the logs. (Contains the next compact log after the snapshot operation).
Logs for etcd0:
{"level":"info","ts":"2024-08-27T09:16:00.041086Z","caller":"v3rpc/maintenance.go:126","msg":"sending database snapshot to client","total-bytes":20992471040,"size":"21 GB"}
{"level":"info","ts":"2024-08-27T09:17:51.627652Z","caller":"v3rpc/maintenance.go:166","msg":"sending database sha256 checksum to client","total-bytes":20992471040,"checksum-size":32}
{"level":"info","ts":"2024-08-27T09:17:51.627695Z","caller":"v3rpc/maintenance.go:175","msg":"successfully sent database snapshot to client","total-bytes":20992471040,"size":"21 GB","took":"1 minute ago"}
{"level":"info","ts":"2024-08-27T09:18:18.985499Z","caller":"etcdserver/server.go:1443","msg":"triggering snapshot","local-member-id":"236527662c6439af","local-member-applied-index":2600026,"local-member-snapshot-index":2500025,"local-member-snapshot-count":100000}
{"level":"info","ts":"2024-08-27T09:18:18.998421Z","caller":"etcdserver/server.go:2462","msg":"saved snapshot","snapshot-index":2600026}
{"level":"info","ts":"2024-08-27T09:18:18.998523Z","caller":"etcdserver/server.go:2492","msg":"compacted Raft logs","compact-index":2595026}
{"level":"info","ts":"2024-08-27T09:19:15.782916Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":2512762}
{"level":"info","ts":"2024-08-27T09:19:19.905582Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":2512762,"took":"4.122138217s","hash":2429156478,"current-db-size-bytes":20992471040,"current-db-size":"21 GB","current-db-size-in-use-bytes":10456432640,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-08-27T09:19:19.905617Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2429156478,"revision":2512762,"compact-revision":2423174}
Logs for etcd1:
{"level":"info","ts":"2024-08-27T09:19:15.7752Z","caller":"v3compactor/periodic.go:134","msg":"starting auto periodic compaction","revision":2512762,"compact-period":"5m0s"}
{"level":"info","ts":"2024-08-27T09:19:15.781344Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":2512762}
{"level":"info","ts":"2024-08-27T09:19:15.781408Z","caller":"v3compactor/periodic.go:142","msg":"completed auto periodic compaction","revision":2512762,"compact-period":"5m0s","took":"6.179178ms"}
{"level":"info","ts":"2024-08-27T09:19:19.785217Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":2512762,"took":"4.003381407s","hash":2429156478,"current-db-size-bytes":21074321408,"current-db-size":"21 GB","current-db-size-in-use-bytes":10453090304,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-08-27T09:19:19.785252Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2429156478,"revision":2512762,"compact-revision":2423174}
Logs for etcd2:
{"level":"info","ts":"2024-08-27T09:19:15.786751Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":2512762}
{"level":"info","ts":"2024-08-27T09:19:20.210784Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":2512762,"took":"4.423434677s","hash":2429156478,"current-db-size-bytes":20979462144,"current-db-size":"21 GB","current-db-size-in-use-bytes":10466041856,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-08-27T09:19:20.210829Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2429156478,"revision":2512762,"compact-revision":2423174}
As we can see, nothing is strange. And the db size is all same.
endpoint status -w table
+---------------------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://etcd0.etcd.svc:2379 | 236527662c6439af | 3.5.13 | 21 GB | false | false | 8 | 2798807 | 2798806 | |
| https://etcd1.etcd.svc:2379 | fdd1098ba1bb471 | 3.5.13 | 21 GB | true | false | 8 | 2798812 | 2798812 | |
| https://etcd2.etcd.svc:2379 | 25b4ff1e9b1f0b1b | 3.5.13 | 21 GB | false | false | 8 | 2798815 | 2798814 | |
+---------------------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
endpoint status -w json
[{"Endpoint":"https://etcd0.etcd.svc:2379","Status":{"header":{"cluster_id":8776373056027032411,"member_id":2550488083760167343,"revision":2803148,"raft_term":8},"version":"3.5.13","dbSize":20992471040,"leader":1143088128565425265,"raftIndex":2808853,"raftTerm":8,"raftAppliedIndex":2808853,"dbSizeInUse":11611930624}},{"Endpoint":"https://etcd1.etcd.svc:2379","Status":{"header":{"cluster_id":8776373056027032411,"member_id":1143088128565425265,"revision":2803151,"raft_term":8},"version":"3.5.13","dbSize":21074321408,"leader":1143088128565425265,"raftIndex":2808856,"raftTerm":8,"raftAppliedIndex":2808856,"dbSizeInUse":11611930624}},{"Endpoint":"https://etcd2.etcd.svc:2379","Status":{"header":{"cluster_id":8776373056027032411,"member_id":2717076982127856411,"revision":2803153,"raft_term":8},"version":"3.5.13","dbSize":20979462144,"leader":1143088128565425265,"raftIndex":2808861,"raftTerm":8,"raftAppliedIndex":2808858,"dbSizeInUse":11612975104}}]
It seems that only the snapshot operation duration contains one compact operation could lead to the db of this member grow up.
/assign
Are you able to reproduce this issue? @ArkaSaha30 @ivanvc @henrybear327
After added log in #18516 , the logs are:
Logs for etcd0:
{"level":"info","ts":"2024-08-30T10:02:00.039084Z","caller":"v3rpc/maintenance.go:126","msg":"sending database snapshot to client","total-bytes":21776793600,"size":"22 GB"}
{"level":"info","ts":"2024-08-30T10:02:04.432765Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":3819633}
{"level":"info","ts":"2024-08-30T10:02:08.041394Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":3819633,"took":"3.608062921s","key-compactions":85268,"hash":2662401697,"current-db-size-bytes":21776793600,"current-db-size":"22 GB","current-db-size-in-use-bytes":19293433856,"current-db-size-in-use":"19 GB"}
{"level":"info","ts":"2024-08-30T10:02:08.041441Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2662401697,"revision":3819633,"compact-revision":3734365}
{"level":"info","ts":"2024-08-30T10:03:58.659047Z","caller":"v3rpc/maintenance.go:166","msg":"sending database sha256 checksum to client","total-bytes":21776793600,"checksum-size":32}
{"level":"info","ts":"2024-08-30T10:03:58.659129Z","caller":"v3rpc/maintenance.go:175","msg":"successfully sent database snapshot to client","total-bytes":21776793600,"size":"22 GB","took":"1 minute ago"}
Logs for etcd1:
{"level":"info","ts":"2024-08-30T10:02:04.423119Z","caller":"v3compactor/periodic.go:134","msg":"starting auto periodic compaction","revision":3819633,"compact-period":"5m0s"}
{"level":"info","ts":"2024-08-30T10:02:04.431864Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":3819633}
{"level":"info","ts":"2024-08-30T10:02:04.431923Z","caller":"v3compactor/periodic.go:142","msg":"completed auto periodic compaction","revision":3819633,"compact-period":"5m0s","took":"8.7761ms"}
{"level":"info","ts":"2024-08-30T10:02:08.319213Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":3819633,"took":"3.886921185s","key-compactions":85268,"hash":2662401697,"current-db-size-bytes":20865847296,"current-db-size":"21 GB","current-db-size-in-use-bytes":10421878784,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-08-30T10:02:08.319249Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2662401697,"revision":3819633,"compact-revision":3734365}
Logs for etcd2:
{"level":"info","ts":"2024-08-30T10:02:04.429785Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":3819633}
{"level":"info","ts":"2024-08-30T10:02:08.37252Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":3819633,"took":"3.942233381s","key-compactions":85268,"hash":2662401697,"current-db-size-bytes":20844969984,"current-db-size":"21 GB","current-db-size-in-use-bytes":10423861248,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-08-30T10:02:08.372556Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2662401697,"revision":3819633,"compact-revision":3734365}
The keyCompactions is same for the three etcd members. But the db size in use for etcd0 is big different from others.
Could you run commands below? Please build the bbolt command from https://github.com/etcd-io/bbolt/tree/main/cmd/bbolt
$ ./bbolt stats ./new.db
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 24
Number of physical branch overflow pages: 0
Number of logical leaf pages: 1450
Number of physical leaf overflow pages: 2467
Tree statistics
Number of keys/value pairs: 5085
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 98304
Bytes actually used for branch data: 48960 (49%)
Bytes allocated for physical leaf pages: 16044032
Bytes actually used for leaf data: 13768079 (85%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 8 (80%)
Bytes used for inlined buckets: 582 (0%)
$ ./bbolt inspect ./new.db
{
"name": "root",
"keyN": 0,
"buckets": [
{
"name": "alarm",
"keyN": 0
},
{
"name": "auth",
"keyN": 1
},
{
"name": "authRoles",
"keyN": 0
},
{
"name": "authUsers",
"keyN": 0
},
{
"name": "cluster",
"keyN": 1
},
{
"name": "key",
"keyN": 5042
},
{
"name": "lease",
"keyN": 35
},
{
"name": "members",
"keyN": 1
},
{
"name": "members_removed",
"keyN": 0
},
{
"name": "meta",
"keyN": 5
}
]
}
Thanks for the reply. Please let me know if you want more informations. @ahrtr
Execute the commands after snapshotting is done(with etcd0) and stop the etcd cluster. The results are:
etcd0:
sh-4.2# ls -al db
-rw------- 1 root root 22825078784 Sep 5 02:48 db
sh-4.2# bbolt stats ./db
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 614
Number of physical branch overflow pages: 0
Number of logical leaf pages: 67282
Number of physical leaf overflow pages: 3363950
Tree statistics
Number of keys/value pairs: 134576
Number of levels in B+tree: 4
Page size utilization
Bytes allocated for physical branch pages: 2514944
Bytes actually used for branch data: 2250360 (89%)
Bytes allocated for physical leaf pages: 14054326272
Bytes actually used for leaf data: 13786544128 (98%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 959 (0%)
sh-4.2# bbolt inspect ./db
{
"name": "root",
"keyN": 0,
"buckets": [
{
"name": "alarm",
"keyN": 0
},
{
"name": "auth",
"keyN": 1
},
{
"name": "authRoles",
"keyN": 0
},
{
"name": "authUsers",
"keyN": 0
},
{
"name": "cluster",
"keyN": 1
},
{
"name": "key",
"keyN": 134566
},
{
"name": "lease",
"keyN": 0
},
{
"name": "members",
"keyN": 3
},
{
"name": "members_removed",
"keyN": 0
},
{
"name": "meta",
"keyN": 5
}
]
}
etcd1:
sh-4.2# ls -al db
-rw------- 1 root root 21282004992 Sep 5 02:47 db
sh-4.2# bbolt stats ./db
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 614
Number of physical branch overflow pages: 0
Number of logical leaf pages: 67282
Number of physical leaf overflow pages: 3363950
Tree statistics
Number of keys/value pairs: 134576
Number of levels in B+tree: 4
Page size utilization
Bytes allocated for physical branch pages: 2514944
Bytes actually used for branch data: 2250360 (89%)
Bytes allocated for physical leaf pages: 14054326272
Bytes actually used for leaf data: 13786544128 (98%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 959 (0%)
sh-4.2# bbolt inspect ./db
{
"name": "root",
"keyN": 0,
"buckets": [
{
"name": "alarm",
"keyN": 0
},
{
"name": "auth",
"keyN": 1
},
{
"name": "authRoles",
"keyN": 0
},
{
"name": "authUsers",
"keyN": 0
},
{
"name": "cluster",
"keyN": 1
},
{
"name": "key",
"keyN": 134566
},
{
"name": "lease",
"keyN": 0
},
{
"name": "members",
"keyN": 3
},
{
"name": "members_removed",
"keyN": 0
},
{
"name": "meta",
"keyN": 5
}
]
}
etcd2:
sh-4.2# ls -al db
-rw------- 1 root root 21267476480 Sep 5 02:47 db
sh-4.2# bbolt stats ./db
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 614
Number of physical branch overflow pages: 0
Number of logical leaf pages: 67282
Number of physical leaf overflow pages: 3363950
Tree statistics
Number of keys/value pairs: 134576
Number of levels in B+tree: 4
Page size utilization
Bytes allocated for physical branch pages: 2514944
Bytes actually used for branch data: 2250360 (89%)
Bytes allocated for physical leaf pages: 14054326272
Bytes actually used for leaf data: 13786544128 (98%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 959 (0%)
sh-4.2# bbolt inspect ./db
{
"name": "root",
"keyN": 0,
"buckets": [
{
"name": "alarm",
"keyN": 0
},
{
"name": "auth",
"keyN": 1
},
{
"name": "authRoles",
"keyN": 0
},
{
"name": "authUsers",
"keyN": 0
},
{
"name": "cluster",
"keyN": 1
},
{
"name": "key",
"keyN": 134566
},
{
"name": "lease",
"keyN": 0
},
{
"name": "members",
"keyN": 3
},
{
"name": "members_removed",
"keyN": 0
},
{
"name": "meta",
"keyN": 5
}
]
}
Would you still like some help reproducing this issue? I'd like to help, but I'm not sure how to simulate or get the database to the 10GB size.
Thanks for your help. @ivanvc
I can explain more details for reproducing this issue.
--auto-compaction-mode=periodic --auto-compaction-retention=5m
.benchmark put
command with this repo: https://github.com/qsyqian/etcd/commit/97fae23a16abcbe7f7f0c3014e77dd1eefde7e66
./benchmark --cacert ./ca.pem --cert cert.pem --key key.pem --endpoints=https://etcd0:2379,https://etcd1:2379,https://etcd2:2379 --rate 300 --total 1000000 --val-size 102400 --clients 6 --conns 20 --insecure-skip-verify put
./etcdctl snapshot --cacert ./ca.pem --cert cert.pem --key key.pem --endpoints=https://$etcd1_IP:2379 --insecure-skip-tls-verify save ./db
Logs for etcd0:
{"level":"info","ts":"2024-09-05T06:17:55.351126Z","caller":"v3compactor/periodic.go:134","msg":"starting auto periodic compaction","revision":1040875,"compact-period":"5m0s"}
{"level":"info","ts":"2024-09-05T06:17:55.358592Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1040875}
{"level":"info","ts":"2024-09-05T06:17:55.358641Z","caller":"v3compactor/periodic.go:142","msg":"completed auto periodic compaction","revision":1040875,"compact-period":"5m0s","took":"7.483919ms"}
{"level":"info","ts":"2024-09-05T06:17:59.927419Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":1040875,"took":"4.568328034s","key-compactions":89635,"hash":1615407175,"current-db-size-bytes":20875321344,"current-db-size":"21 GB","current-db-size-in-use-bytes":10473357312,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-09-05T06:17:59.927459Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":1615407175,"revision":1040875,"compact-revision":951240}
Logs for etcd1:
{"level":"info","ts":"2024-09-05T06:17:29.342478Z","caller":"v3rpc/maintenance.go:126","msg":"sending database snapshot to client","total-bytes":20874055680,"size":"21 GB"}
{"level":"info","ts":"2024-09-05T06:17:55.361292Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1040875}
{"level":"info","ts":"2024-09-05T06:17:59.09344Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":1040875,"took":"3.731648292s","key-compactions":89635,"hash":1615407175,"current-db-size-bytes":20874055680,"current-db-size":"21 GB","current-db-size-in-use-bytes":19745542144,"current-db-size-in-use":"20 GB"}
{"level":"info","ts":"2024-09-05T06:17:59.093478Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":1615407175,"revision":1040875,"compact-revision":951240}
{"level":"info","ts":"2024-09-05T06:19:23.207856Z","caller":"v3rpc/maintenance.go:166","msg":"sending database sha256 checksum to client","total-bytes":20874055680,"checksum-size":32}
{"level":"info","ts":"2024-09-05T06:19:23.207983Z","caller":"v3rpc/maintenance.go:175","msg":"successfully sent database snapshot to client","total-bytes":20874055680,"size":"21 GB","took":"1 minute ago"}
Logs for etcd2:
{"level":"info","ts":"2024-09-05T06:17:55.364262Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1040875}
{"level":"info","ts":"2024-09-05T06:17:59.695769Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":1040875,"took":"4.331001622s","key-compactions":89635,"hash":1615407175,"current-db-size-bytes":20951465984,"current-db-size":"21 GB","current-db-size-in-use-bytes":10465939456,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-09-05T06:17:59.695805Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":1615407175,"revision":1040875,"compact-revision":951240}
endpoint status -w table
+------------------------+------------------+---------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | STORAGE VERSION | DB SIZE | IN USE | PERCENTAGE NOT IN USE | QUOTA | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+
| https://etcd2:2379 | c202fbb15816e8a3 | 3.5.13 | | 21 GB | 15 GB | 30% | 0 B | false | false | 8 | 1361716 | 1361716 | |
| https://etcd0:2379 | f5062531d505f9e | 3.5.13 | | 21 GB | 15 GB | 30% | 0 B | true | false | 8 | 1361720 | 1361719 | |
| https://etcd1:2379 | eb74fbe807419692 | 3.5.13 | | 24 GB | 15 GB | 38% | 0 B | false | false | 8 | 1361724 | 1361724 | |
+------------------------+------------------+---------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+
endpoint status -w json
[{"Endpoint":"https://etcd2:2379","Status":{"header":{"cluster_id":15796809493248337086,"member_id":13980012932417120419,"revision":1384214,"raft_term":8},"version":"3.5.13","dbSize":20951465984,"leader":1103490017819385758,"raftIndex":1384260,"raftTerm":8,"raftAppliedIndex":1384260,"dbSizeInUse":17142501376}},{"Endpoint":"https://etcd0:2379","Status":{"header":{"cluster_id":15796809493248337086,"member_id":1103490017819385758,"revision":1384218,"raft_term":8},"version":"3.5.13","dbSize":20875321344,"leader":1103490017819385758,"raftIndex":1384264,"raftTerm":8,"raftAppliedIndex":1384264,"dbSizeInUse":17143234560}},{"Endpoint":"https://etcd1:2379","Status":{"header":{"cluster_id":15796809493248337086,"member_id":16966462670184224402,"revision":1384220,"raft_term":8},"version":"3.5.13","dbSize":23535874048,"leader":1103490017819385758,"raftIndex":1384266,"raftTerm":8,"raftAppliedIndex":1384266,"dbSizeInUse":17143652352}}]
This comment said that:
A long running read transaction can cause the database to quickly grow.
And this comment also show that:
Long running read transactions can be problematic in Bolt because they can cause the data file to grow since their dependent pages can't be reused.
As the etcd snapshot process is holding one long running read transaction until send all the db to client. So this is an issue with BoltDB that has persisted for a very, very long time? :(
Execute the commands after snapshotting is done(with etcd0) and stop the etcd cluster. The results are:
Just to double confirm, when you executed the commands, you had already reproduced the issue again (the etcd0 has much bigger value for current-db-size-in-use-bytes
?
Execute the commands after snapshotting is done(with etcd0) and stop the etcd cluster. The results are:
Just to double confirm, when you executed the commands, you had already reproduced the issue again (the etcd0 has much bigger value for
current-db-size-in-use-bytes
?
Yes, but unfortunately i didn't save the log. :(
If you find some strange, please tell me. I can test it again.
Yes, but unfortunately i didn't save the log. :(
If you find some strange, please tell me. I can test it again.
Thanks for the feedback. Please restart the problematic etcd instance after reproducing this "issue", afterwards please verify whether the "size-in-use" can be recovered back to normal automatically. Search the log "recovered v3 backend from snapshot" and check the value of backend-size-in-use-bytes
, or just execute etcdctl endpoint status -w json
and check the field dbSizeInUse
.
Please restart the problematic etcd instance after reproducing this "issue", afterwards please verify whether the "size-in-use" can be recovered back to normal automatically.
Actually you don't have to restart the etcd instance. Just executing etcdctl endpoint status -w json
(and checking the field dbSizeInUse
) after the snapshot operation is done should be fine. So the rough steps are below,
etcdctl endpoint status -w json
and checking the field dbSizeInUse
. etcdctl endpoint status -w json
and checking the field dbSizeInUse
. Please let me know the result.
The result is blew. Please not that the log time plus 8 is command run time.
Do snapshot with etcd1:
date; ./etcdctl snapshot --cacert ./ca.pem --cert cert.pem --key key.pem --endpoints=https://etcd1:2379 --insecure-skip-tls-verify save ./db
Fri Sep 6 08:57:59 CST 2024
{"level":"info","ts":"2024-09-06T08:57:59.266841+0800","caller":"snapshot/v3_snapshot.go:67","msg":"created temporary db file","path":"./db.part"}
{"level":"info","ts":"2024-09-06T08:57:59.273356+0800","logger":"client","caller":"v3/maintenance.go:237","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-09-06T08:57:59.283075+0800","caller":"snapshot/v3_snapshot.go:75","msg":"fetching snapshot","endpoint":"https://7.8.70.3:2379"}
{"level":"info","ts":"2024-09-06T08:59:51.635947+0800","logger":"client","caller":"v3/maintenance.go:303","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-09-06T09:00:26.803606+0800","caller":"snapshot/v3_snapshot.go:90","msg":"fetched snapshot","endpoint":"https://7.8.70.3:2379","size":"21 GB","took":"2m27.53669875s","etcd-version":""}
{"level":"info","ts":"2024-09-06T09:00:26.803711+0800","caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"./db"}
Snapshot saved at ./db
logs for etcd0:
{"level":"info","ts":"2024-09-06T00:58:29.653151Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":2039307}
{"level":"info","ts":"2024-09-06T00:58:33.62825Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":2039307,"took":"3.974669443s","key-compactions":89650,"hash":4293915956,"current-db-size-bytes":20805419008,"current-db-size":"21 GB","current-db-size-in-use-bytes":10428334080,"current-db-size-in-use":"10 GB"}
logs for etcd1:(the snapshot member)
{"level":"info","ts":"2024-09-06T00:57:59.278771Z","caller":"v3rpc/maintenance.go:126","msg":"sending database snapshot to client","total-bytes":20933140480,"size":"21 GB"}
{"level":"info","ts":"2024-09-06T00:58:29.649351Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":2039307}
{"level":"info","ts":"2024-09-06T00:58:32.990668Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":2039307,"took":"3.340904508s","key-compactions":89650,"hash":4293915956,"current-db-size-bytes":20933140480,"current-db-size":"21 GB","current-db-size-in-use-bytes":19706777600,"current-db-size-in-use":"20 GB"}
{"level":"info","ts":"2024-09-06T00:58:32.990715Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":4293915956,"revision":2039307,"compact-revision":1949657}
{"level":"info","ts":"2024-09-06T00:59:51.636236Z","caller":"v3rpc/maintenance.go:166","msg":"sending database sha256 checksum to client","total-bytes":20933140480,"checksum-size":32}
{"level":"info","ts":"2024-09-06T00:59:51.636324Z","caller":"v3rpc/maintenance.go:175","msg":"successfully sent database snapshot to client","total-bytes":20933140480,"size":"21 GB","took":"1 minute ago"}
logs for etcd2:
{"level":"info","ts":"2024-09-06T00:58:29.643483Z","caller":"v3compactor/periodic.go:134","msg":"starting auto periodic compaction","revision":2039307,"compact-period":"5m0s"}
{"level":"info","ts":"2024-09-06T00:58:29.648981Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":2039307}
{"level":"info","ts":"2024-09-06T00:58:29.649036Z","caller":"v3compactor/periodic.go:142","msg":"completed auto periodic compaction","revision":2039307,"compact-period":"5m0s","took":"5.517805ms"}
{"level":"info","ts":"2024-09-06T00:58:34.370113Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":2039307,"took":"4.720662221s","key-compactions":89650,"hash":4293915956,"current-db-size-bytes":20828606464,"current-db-size":"21 GB","current-db-size-in-use-bytes":10451628032,"current-db-size-in-use":"10 GB"}
{"level":"info","ts":"2024-09-06T00:58:34.370151Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":4293915956,"revision":2039307,"compact-revision":1949657}
after snapshot done:
date; endpoint status -w json | python -m json.tool
Fri Sep 6 09:00:45 CST 2024
[
{
"Endpoint": "https://etcd2:2379",
"Status": {
"dbSize": 20828606464,
"dbSizeInUse": 14477979648,
"header": {
"cluster_id": 15796809493248337086,
"member_id": 13980012932417120419,
"raft_term": 9,
"revision": 2177902
},
"leader": 13980012932417120419,
"raftAppliedIndex": 2178172,
"raftIndex": 2178172,
"raftTerm": 9,
"version": "3.5.13"
}
},
{
"Endpoint": "https://etcd0:2379",
"Status": {
"dbSize": 20805419008,
"dbSizeInUse": 14477979648,
"header": {
"cluster_id": 15796809493248337086,
"member_id": 1103490017819385758,
"raft_term": 9,
"revision": 2177907
},
"leader": 13980012932417120419,
"raftAppliedIndex": 2178177,
"raftIndex": 2178177,
"raftTerm": 9,
"version": "3.5.13"
}
},
{
"Endpoint": "https://etcd1:2379",
"Status": {
"dbSize": 23295725568,
"dbSizeInUse": 14478397440,
"header": {
"cluster_id": 15796809493248337086,
"member_id": 16966462670184224402,
"raft_term": 9,
"revision": 2177910
},
"leader": 13980012932417120419,
"raftAppliedIndex": 2178180,
"raftIndex": 2178180,
"raftTerm": 9,
"version": "3.5.13"
}
}
]
then restart etcd1.(Before restart i don't stop the benchmark client. And this take a long time about 15m, the etcd1 doesn't running until i stop the benchmark client. Don't have time to dig it)
date; endpoint status -w json | python -m json.tool
Fri Sep 6 09:19:26 CST 2024
[
{
"Endpoint": "https://etcd2:2379",
"Status": {
"dbSize": 21817737216,
"dbSizeInUse": 3679567872,
"header": {
"cluster_id": 15796809493248337086,
"member_id": 13980012932417120419,
"raft_term": 9,
"revision": 2432069
},
"leader": 13980012932417120419,
"raftAppliedIndex": 2432419,
"raftIndex": 2432419,
"raftTerm": 9,
"version": "3.5.13"
}
},
{
"Endpoint": "https://etcd0:2379",
"Status": {
"dbSize": 20805419008,
"dbSizeInUse": 3679567872,
"header": {
"cluster_id": 15796809493248337086,
"member_id": 1103490017819385758,
"raft_term": 9,
"revision": 2432069
},
"leader": 13980012932417120419,
"raftAppliedIndex": 2432419,
"raftIndex": 2432419,
"raftTerm": 9,
"version": "3.5.13"
}
},
{
"Endpoint": "https://etcd1:2379",
"Status": {
"dbSize": 21817737216,
"dbSizeInUse": 3712581632,
"header": {
"cluster_id": 15796809493248337086,
"member_id": 16966462670184224402,
"raft_term": 9,
"revision": 2432069
},
"leader": 13980012932417120419,
"raftAppliedIndex": 2432419,
"raftIndex": 2432419,
"raftTerm": 9,
"version": "3.5.13"
}
}
]
I try to Summarize this issue. Please tell me if i am wrong. cc @ahrtr
The logs show that the current-db-size-in-use
is different from others. From this , may be the compact is actually execute and delete the revisions not in keep
(and the log in this pr #18516 also prove it ), but the long running snapshot tx hold the pages and didn't set them to free pages. After snapshot done, the read tx release pages and the dbSizeInuse is same from others.
Note the dbSizeInuse caculate method is blew, and it has close contact with free pages number.
atomic.StoreInt64(&b.sizeInUse, size-(int64(stats.FreePageN)*int64(db.Info().PageSize)))
After snapshot, the etcd1 member has big dbSize than others.
Also the reason from 1, the benchmark client continue to put and update keys and values during snapshot, the boltdb has not enough free pages to allocate, so it has to grow up the db.
Thanks for all the feedback, which confirmed that this isn't an issue at all. Instead, it's expected behavior.
From your feedback in https://github.com/etcd-io/etcd/issues/18509#issuecomment-2330519475, it confirmed that there is no any issue on the compaction, because all etcd members had exactly the same k/v pairs after the compactions.
The root cause is as you already pointed out in https://github.com/etcd-io/etcd/issues/18509#issuecomment-2331075925. To be clearer, there are two things to clarify:
db-size-in-use
is different if the etcd member concurrently executes a snapshot?
It guarantees repeatable read. As long as there is a read transaction created before a write transaction, then any pages can't be released & reused immediately by the write transaction. Once there isn't any read transaction, then the pending free page will be completely released; and it's exactly the reason why the db-size-in-use returns back to normal automatically after the snapshot operation is done. References:
Thanks. @ahrtr
Could you take a look at #18545 and #18516 ? If they are not needed, i will close them.
Bug report criteria
What happened?
When snapshot one etcd cluster and i found that the compact does not effect.
Metrics for db and db_in_use:
Snapshot to the etcd0 member, and we can see the etcd0's compaction take effect delayed.
Logs for etcd0: (Please note that the log time should plus 8 for metrics time)
We can see that, after the compaction in time 29:13, the db_inuse is about
"current-db-size-in-use-bytes":19744907264,"current-db-size-in-use":"20 GB"
.Logs for etcd1:
Logs for etcd2:
For etcd1 and etcd2, after compaction in time 29:13, the db_inuse is about
"current-db-size-in-use-bytes":10467917824,"current-db-size-in-use":"10 GB"
.The more important is that, after this, the db total size for etcd0 is grow up. Please ignore the db_inuse size, it is because the client stop update and put keys.
What did you expect to happen?
the compact should take effect during snapshot.
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
auto compaction params:
Etcd version (please run commands below)
v3.5.13
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response