etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.36k stars 9.72k forks source link

High memory usage (memory leak?) with 1 node cluster and small DB (but a lot of updates) #18382

Open vitalif opened 1 month ago

vitalif commented 1 month ago

Bug report criteria

What happened?

My etcd 3.5.8 ate 12 GB of RAM... The database size is < 2 MB (etcdctl get --write-out=json --prefix / gives me 1966139 byte dump). The database takes 1.5 GB on disk (/var/lib/etcd0.etcd). There are 6 active streams. Client traffic shows 2.2 MB/s in and 2.5 MB/s out in monitoring dashboard. There are 128+5 keys which are bumped continuosly through leases. There are other 139 keys which are updated every 30 seconds.

etcd_obozhralsya

As I've already experienced such memory usage with this etcd, it was started with --enable-pprof=true, pprof shows the following:

# go tool pprof http://192.168.1.144:2379/debug/pprof/heap
Fetching profile over HTTP from http://192.168.1.144:2379/debug/pprof/heap
Saved profile in /root/pprof/pprof.etcd.alloc_objects.alloc_space.inuse_objects.inuse_space.012.pb.gz
File: etcd
Type: inuse_space
Time: Jul 31, 2024 at 1:45am (MSK)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 10.88GB, 99.59% of 10.92GB total
Dropped 179 nodes (cum <= 0.05GB)
Showing top 10 nodes out of 17
      flat  flat%   sum%        cum   cum%
   10.88GB 99.59% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/api/v3/etcdserverpb.(*InternalRaftRequest).Marshal
         0     0% 99.59%    10.88GB 99.59%  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
         0     0% 99.59%    10.88GB 99.59%  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
         0     0% 99.59%    10.88GB 99.59%  github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
         0     0% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/api/v3/etcdserverpb._KV_Txn_Handler
         0     0% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/api/v3/etcdserverpb._KV_Txn_Handler.func1
         0     0% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).Txn
         0     0% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).processInternalRaftRequestOnce
         0     0% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).raftRequest (inline)
         0     0% 99.59%    10.88GB 99.59%  go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).raftRequestOnce

Is it a memory leak or what?...

What did you expect to happen?

I expected etcd to consume at most 1 GB of memory with 2 MB database

How can we reproduce it (as minimally and precisely as possible)?

Install Vitastor on 1 node. :-) but I'm not sure it will be reproduced easily.

Anything else we need to know?

It's still running, I can collect other information or connect with debugger if you tell me what I should collect. :-)

Etcd version (please run commands below)

```console $ etcd --version etcd Version: 3.5.8 Git SHA: 217d183e5 Go Version: go1.19.5 Go OS/Arch: linux/amd64 $ etcdctl version etcdctl version: 3.5.8 API version: 3.5 ```

Etcd configuration (command line flags or environment variables)

Startup options: `etcd -name etcd0 --data-dir /var/lib/etcd0.etcd --enable-pprof=true --snapshot-count=1000 --advertise-client-urls http://192.168.1.144:2379 --listen-client-urls http://192.168.1.144:2379 --initial-advertise-peer-urls http://192.168.1.144:2380 --listen-peer-urls http://192.168.1.144:2380 --initial-cluster-token vitastor-etcd-1 --initial-cluster etcd0=http://192.168.1.144:2380 --initial-cluster-state new --max-txn-ops=100000 --max-request-bytes=104857600 --auto-compaction-retention=10 --auto-compaction-mode=revision`

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

```console $ etcdctl member list -w table +------------------+---------+-------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-------+---------------------------+---------------------------+------------+ | 720141fbb7123c6f | started | etcd0 | http://192.168.1.144:2380 | http://192.168.1.144:2379 | false | +------------------+---------+-------+---------------------------+---------------------------+------------+ $ etcdctl --endpoints= endpoint status -w table +--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | 192.168.1.144:2379 | 720141fbb7123c6f | 3.5.8 | 1.2 GB | true | false | 51 | 16960478 | 16960478 | | +--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ ```

Relevant log output

No response

serathius commented 1 month ago

Looks like https://github.com/etcd-io/etcd/issues/17098 for me, but to confirm. Can you provide:

ahrtr commented 1 month ago

Could you also attach the generated pprof.etcd.alloc_objects.alloc_space.inuse_objects.inuse_space.*.pb.gz file?

ahrtr commented 1 month ago

The database size is < 2 MB (etcdctl get --write-out=json --prefix / gives me 1966139 byte dump). The database takes 1.5 GB on disk (/var/lib/etcd0.etcd).

Please also perform a compaction + defragmentation, and retry, please let's know the result. thx

socketpair commented 1 month ago

@ahrtr Topic starter concerns on RAM consumption, not about disk usage. If they are connected somehow, yes it might help, but disk usage itself is not the issue. Note, RAM usage is about 10 times bigger then the disk usage.

ahrtr commented 1 month ago

If they are connected somehow

They are related, bbolt mmaps the file into memory directly.

vitalif commented 1 month ago

Here's the profile pprof.etcd.alloc_objects.alloc_space.inuse_objects.inuse_space.012.pb.gz

Percentage of PUT requests I think was close to 100% )). There were puts + 6 watchers, that's all. RPC rate graph shows 0.85 rps and 2.5 MByte/s traffic.

Memory usage actually dropped the next day along with update traffic when I restarted my software and it stopped updating... something. I think it was old statistics but I'm not sure. 2.5 MB/s is quite a lot given that total amount of data in etcd is 2 MB :-). My software updates statistics every 5 seconds in that installation, so 2.5 MB/s is 12.5 MB every 5 seconds... Still more than 2 MB. So I'm not sure what it was. Maybe these were some unapplied compare failures and retries... I'll monitor this instance for more time.

Here's the screenshot from grafana: image

socketpair commented 1 month ago

@ahrtr okay, but

  1. does it map multiple times to different places ?
  2. If yes - it does not increase memory usage (RSS) because of shared memory.
  3. if no, why sizes differ so much ? seems it allocates huge amounts of anonymous memory (copies ?)
socketpair commented 1 month ago

I'm not an expert in Go. Is there any memory mapper ? I mean a tool that shows what memory block is used for.

clement2026 commented 1 week ago

@socketpair I get that you’re concerned about memory size. You might want to try setting the --experimental-snapshot-catchup-entries flag. The default value is 5000, but you can lower it to, say, 500, which should bring your memory usage down to around 3GB.

The downside of setting a low value for --experimental-snapshot-catchup-entries is that slow followers might need to catch up with the leader using a snapshot, which is less efficient. But if you’re running a single instance of etcd, you can even set it to 1 and can keep your memory usage under 2GB.

Based on your posts, it seems the average size of a put request is around 2MB.

Your etcd instance’s heap size is roughly between (--experimental-snapshot-catchup-entries) * 2MB and (--experimental-snapshot-catchup-entries + --snapshot-count) * 2MB. In your case, that’s about 10GB to 12GB.

I’m sharing some tests here for your reference.

Test 1

--experimental-snapshot-catchup-entries 5000

Setup:

rm -rf etcd0.etcd; etcd -name etcd0 \
--data-dir etcd0.etcd \
--enable-pprof=true \
--snapshot-count=1000 \
--advertise-client-urls http://localhost:2379 \
--listen-client-urls http://localhost:2379 \
--initial-advertise-peer-urls http://localhost:2380 \
--listen-peer-urls http://localhost:2380 \
--initial-cluster-token vitastor-etcd-1 \
--initial-cluster etcd0=http://localhost:2380 \
--initial-cluster-state new \
--max-txn-ops=100000 \
--max-request-bytes=104857600 \
--auto-compaction-mode=periodic \
--auto-compaction-retention=5s \
--experimental-snapshot-catchup-entries 5000

Benchmark script:

./bin/tools/benchmark txn-mixed --total=50000000 --val-size=2000000 --key-space-size=1 --rw-ratio=1
Xnip2024-09-06_14-55-54

Test 2

The setup and benchmark script is the same except --experimental-snapshot-catchup-entries 500.

Xnip2024-09-06_14-39-17

Test 3

--experimental-snapshot-catchup-entries 1

image

Hope it helps.

serathius commented 4 days ago

Awesome analysis @clement2026, makes me very excited about your work in https://github.com/etcd-io/etcd/issues/17098. Would be awesome to paste those findings there too.

clement2026 commented 4 days ago

I’m glad you found the analysis helpful. I’ll include them in #17098.