docker-library / rabbitmq

Docker Official Image packaging for RabbitMQ
http://www.rabbitmq.com/
MIT License
780 stars 412 forks source link

Possible Memory Leak? #677

Closed Kraego closed 10 months ago

Kraego commented 10 months ago

Setup

Memory Consumption

The strange thing is that there is no Queue nor is there any traffic, but the memory consumption rises until the pod gets killed by the OOM. image

OS PID: 13 OS: Linux Uptime (seconds): 68724 Is under maintenance?: false RabbitMQ version: 3.12.9 RabbitMQ release series support status: supported Node name: rmq-host@rmq-host Erlang configuration: Erlang/OTP 25 [erts-13.2.2.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit] Crypto library: OpenSSL 3.1.4 24 Oct 2023 Erlang processes: 446 used, 1048576 limit Scheduler run queue: 1 Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins Enabled plugins:

Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rmq-host@rmq-host Raft data directory: /var/lib/rabbitmq/mnesia/rmq-host@rmq-host/quorum/rmq-host@rmq-host

Config files

Log file(s)

Alarms

(none)

Memory

Total memory used: 0.1512 gb Calculation strategy: rss Memory high watermark setting: 0.2684 gb, computed to: 0.2684 gb

reserved_unallocated: 0.0774 gb (51.23 %) code: 0.0389 gb (25.76 %) other_system: 0.0222 gb (14.71 %) other_proc: 0.0153 gb (10.1 %) metrics: 0.0017 gb (1.15 %) other_ets: 0.0016 gb (1.06 %) atom: 0.0015 gb (0.98 %) plugins: 0.001 gb (0.66 %) binary: 0.0005 gb (0.31 %) msg_index: 0.0004 gb (0.24 %) mgmt_db: 0.0003 gb (0.21 %) mnesia: 0.0001 gb (0.05 %) connection_other: 0.0 gb (0.03 %) quorum_ets: 0.0 gb (0.02 %) quorum_queue_dlx_procs: 0.0 gb (0.0 %) quorum_queue_procs: 0.0 gb (0.0 %) stream_queue_procs: 0.0 gb (0.0 %) stream_queue_replica_reader_procs: 0.0 gb (0.0 %) allocated_unused: 0.0 gb (0.0 %) connection_channels: 0.0 gb (0.0 %) connection_readers: 0.0 gb (0.0 %) connection_writers: 0.0 gb (0.0 %) queue_procs: 0.0 gb (0.0 %) queue_slave_procs: 0.0 gb (0.0 %) stream_queue_coordinator_procs: 0.0 gb (0.0 %)

File Descriptors

Total: 2, limit: 1048479 Sockets: 0, limit: 943629

Free Disk Space

Low free disk space watermark: 0.05 gb Free disk space: 2.1475 gb

Totals

Connection count: 0 Queue count: 0 Virtual host count: 1

Listeners

Interface: [::], port: 15672, protocol: http, purpose: HTTP API Interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0



## Research Done
* Calling `rabbitmqctl force_gc` releases approximately 6MB 
* There was also a thread (https://groups.google.com/g/rabbitmq-users/c/v630G6OCxuU) discussion a memory leak (regarding lazy queues, limiting the index, paging, ...) since we don't have any queues imho I think that's irrelevant.
* There was a possible Leak in Erlang: https://groups.google.com/g/rabbitmq-users/c/UE-wxXerJl8, this was fixed in OTP 24.2.1  (https://github.com/erlang/otp/releases/tag/OTP-24.2.1) the used image uses 25.3.2.1
* What I've recognized so far is that rabbitMq had **problems detecting actual available memory** in the pod: `
Memory high watermark setting: 0.4 of available memory, computed to: 54.0295 gb` (real available is 1GiB, so I configured an absolute watermark). Maybe erlang has a similar Problem and that's the reason why the GC don't kicks in - but that's just a wild assumption since my erlang knowledge is zero.
lukebakken commented 10 months ago

Hello, thanks for using RabbitMQ.

This repository is probably the wrong place to report this issue.

However, before I re-direct you to the correct one (rabbitmq/rabbitmq-server), I would like you to try setting the high memory water mark to an absolute value:

https://www.rabbitmq.com/memory.html

vm_memory_high_watermark.absolute = 536870912

See if that resolves the issue in your environment. Thanks.

Kraego commented 10 months ago

@lukebakken - thanks for the fast feedback. I've already set the watermark to 256MB -> vm_memory_high_watermark.absolute = 256MiB, can you move the issue?

lukebakken commented 10 months ago

Oh, I didn't notice that, thanks for pointing it out. GitHub is not allowing me to move this issue, probably because you can't do that across orgs, or I don't have the appropriate permissions.

Please start a discussion here:

https://github.com/rabbitmq/rabbitmq-server/discussions/

Team RabbitMQ always starts with a discussion until actionable work can be found. Thanks again.

LaurentGoderre commented 10 months ago

I can transfer the issue but not to the rabbitmq org

lukebakken commented 10 months ago

No one person has all of the power!