gnocchixyz / gnocchi

Timeseries database
Apache License 2.0
299 stars 85 forks source link

gnocchi-metricd service RAM usage increasing #1294

Open aleksei-mv opened 1 year ago

aleksei-mv commented 1 year ago

Before reporting an issue on Gnocchi, please be sure to provide all necessary information.

Which version of Gnocchi are you using

gnocchi affected versions 4.4.1-4.4.2 Kolla-based Openstack Yoga installation. Gnocchi containers gnocchi_api and gnocchi_metricd are built from source.

How to reproduce your problem

  1. Container-based(Kolla) Openstack Yoga installation
  2. Gnocchi version: 4.4.1-4.4.2
  3. Redis as incoming storage
  4. S3 as persistent storage
  5. Ceilometer as metric collector

What is the result that you get

  1. ~800MB RAM usage per metricd worker and growing.
  2. ~2GB RAM usage added per 24 hours for 12 workers configuration

What is result that you expected

~150MB RAM usage per metricd worker as mentioned in this issue #606

Additional info

Hi, everyone!

I ran into problem that my Gnocchi installation continiously increasing RAM usage after service start. Specifically, I found out that problem is in metricd module. 12 worker installation starts to use ~2GB more RAM daily. RAM cleans up after service restart, but gets littered after.

There is telemetry for gnocchi-metricd container by cAdvisor:

  1. RAM value after few hours from service start: image
  2. RAM after few days of service working: image

Enabling debug mode and inspecting logs does not give any result. No error logs found. Metricd proccessing all metrics from incoming storage, so no metrics stuck.

There is gnocchi.conf sections I'm using:

...
[metricd]
workers = 12
metric_processing_delay = 60
metric_reporting_delay = -1
metric_cleanup_delay = 60
processing_replicas = 3
cleanup_batch_size = 10000
...
[storage]
driver = s3
s3_endpoint_url = <url>
s3_access_key_id = <id>
s3_secret_access_key = <key>
s3_bucket_prefix = gnocchi
s3_check_consistency_timeout = 30
s3_max_pool_connections = 100
...

There is ps_mem output for gnocchi user in host system:

ps_mem -p $(pgrep -d, -u 42416)
 Private  +   Shared  =  RAM used   Program

 72.0 KiB +  20.5 KiB =  92.5 KiB   dumb-init
 35.6 MiB +   4.7 MiB =  40.3 MiB   gnocchi-metricd
607.0 MiB +  20.0 MiB = 627.0 MiB   apache2 (5)
 11.6 GiB +  30.0 MiB =  11.7 GiB   python3.8 (13)
---------------------------------
                         12.3 GiB
=================================

Unfortunately, my python/programming skills are on quiet low level, so I'm not able to debug such a large app on my own. :(

tobias-urdin commented 1 year ago

Hello 👋 I don't have any great ideas, the only known issue we've had recently is the memory bug in the ujson library so I suggest checking the version of that used, see https://github.com/gnocchixyz/gnocchi/pull/1136

Other than that you would need to troubleshoot further to get more information, we are not using the S3 storage so I don't know if there might be an issue with that or if it's something else.

daydrim commented 1 year ago

Hello, we've found that lots of file descriptors opening (300K+) and can not be closed. Probably this RAM usage in increasing because file descriptors constantly grows.

gnocchi-metricd process opens eventpoll - descriptors and do not close

Firstly we thought that this is the problem with boto3 library , but do not found any bugs related to this directly.