dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
24.54k stars 887 forks source link

Dragonfly sometimes fails with OOM even when `--cache_mode=true` #3155

Open bogdanp05 opened 1 month ago

bogdanp05 commented 1 month ago

Describe the bug Dragonfly fails with OOM even when --cache_mode=true. This happens after multiple hours/days of intense load.

To Reproduce We noticed this on several services that were processing ~200k commands/s and which had between 11k and 15k clients. Also, at the time of the OOM restart, the RSS memory was 162GB, while the used memory was 128GB.

Jun 08 13:56:10 dragonfly-1-4 systemd[1]: dragonfly.service: A process of this unit has been killed by the OOM killer.
Jun 08 13:56:38 dragonfly-1-4 systemd[1]: dragonfly.service: Main process exited, code=killed, status=9/KILL
Jun 08 13:56:38 dragonfly-1-4 systemd[1]: dragonfly.service: Failed with result 'oom-kill'.
Jun 08 13:56:38 dragonfly-1-4 systemd[1]: dragonfly.service: Consumed 2w 4d 4h 34min 1.671s CPU time.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: dragonfly.service: Scheduled restart job, restart counter is at 1.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: Stopped dragonfly.service - Aiven dragonfly in container.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: dragonfly.service: Consumed 2w 4d 4h 34min 1.671s CPU time.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: Started dragonfly.service - Aiven dragonfly in container.

Expected behavior Dragonfly shouldn't crash.

Environment (please complete the following information):

kostasrim commented 4 weeks ago

Hi @bogdanp05, thank you for reporting this.

@chakaz I see RSS here, is that something we are aware of or shall we investigate?

romange commented 4 weeks ago

We are taking care of this.