facette / facette

Time series data visualization software
https://facette.io/
BSD 3-Clause "New" or "Revised" License
1.15k stars 76 forks source link

out of memory #395

Open akohlbecker opened 2 years ago

akohlbecker commented 2 years ago

After starting the service facette eats up ~ 19GB RAM and terminates with fatal error: runtime: out of memory

Please find thew corresponding systemlog attached to this ticket. facette-outofmem.log

vbatoufflet commented 2 years ago

It clearly looks like a nasty memory leak.

Just to try to pinpoint what's going on. You're using RRD provider right, and you just let the service run with a refresh interval (if yes what was the value)?

akohlbecker commented 2 years ago

Hi Vincent,

yes I am using the RRD provider. The refresh interval for the UI is set to 10s, this is default I guess. As far as I know there is no other refresh interval setting, or did I miss something?

The memory leak occurs purely on startup of the server, even if it is running completely without any client interaction. I assume it initially scans the rrd sources and the memory leak occurs in that initialization phase.

Can I provide you with more information to help pinning down the cause for the memory leak?

Best, Andreas

vbatoufflet commented 2 years ago

Sadly, there is no pprof endpoint on Facette to help gathering information on this leak.

I'll try to reproduce and build a custom version with such endpoint.

Regarding the refresh interval, I was talking about the one from the provider definition (default is 0, i.e. no refresh):

Screenshot 2021-09-27 at 22-41-43 New provider – Administration panel – Facette

vbatoufflet commented 2 years ago

hey @akohlbecker,

Quick additional questions:

  1. Do you have any symbolic links in your RRD folders, and is there any possible symlink loop (this case is indeed not handled)?
  2. Did you try running Facette in debug mode (see https://github.com/facette/facette/blob/master/docs/examples/facette.yaml#L6) to see if there is something weird in the logs?

Regards, Vincent

akohlbecker commented 2 years ago

The refresh interval for the rrd provider was set to 10 . After setting this to 500 facette seems to behave normally. These are milliseconds, not seconds as I was assuming originally, right?

I will report long term results from this settings change tomorrow.

vbatoufflet commented 2 years ago

These are milliseconds, not seconds as I was assuming originally, right?

This setting unit is seconds: https://docs.facette.io/latest/api/providers/#create-a-provider

It would be surprising that raising this interval fixes the issue. It might take longer to trigger though.

akohlbecker commented 2 years ago

Hi Vincent,

your expectation was correct, the memory consumption has increased over night and is now at about 10 GB.

I checked the rrd folders for symlinks and found none.

The debug log contains many entries like these (ellypsed here):

2021/09/28 09:59:51.058919 DEBUG: poller[collectd]: inserted record {Origin: "collectd", Source: .... in "collectd" catalog
2021/09/28 09:59:51.058926 DEBUG: poller[collectd]: does not match "/average$" sieve pattern, discarding: .... 

apart from these 588313 lines after running facette for 10 minutes with an rrd provider refresh interval of 500, the log only has these entries:

2021/09/28 09:59:49.834419 INFO: http: started
2021/09/28 09:59:49.834420 INFO: poller: started
2021/09/28 09:59:49.834630 INFO: http: listening on "127.0.0.1:12003"
2021/09/28 09:59:49.835399 DEBUG: poller[collectd]: started
2021/09/28 09:59:50.867371 DEBUG: poller[collectd]: restored previous catalog state in 1.031827415s
2021/09/28 09:59:50.867445 DEBUG: poller[collectd]: refreshing "collectd" provider
2021/09/28 10:08:10.867708 DEBUG: poller[collectd]: refreshing "collectd" provider
vbatoufflet commented 2 years ago

Which version are you running, the latest release or a build from master?

What's your platform/architecture, linux/amd64?

I built a custom version yesterday having a pprof HTTP endpoint that might would allow us to visualize heap usage while running the service. I'll try to push it in a dedicated branch tonight but I can build the binary for you to test if you want.

akohlbecker commented 2 years ago

I am running version 0.5.1 on linux/amd64 (4.9.0-15-amd64 #1 SMP Debian 4.9.258-1 (2021-03-08) x86_64 GNU/Linux)

It would be great if you could build the binary for me.

TNX Andreas

vbatoufflet commented 2 years ago

Hi @akohlbecker,

Sorry for the delay here.

I just pushed changes to a dedicated branch that registers debugging pprof endpoints to the web server, see 593ce3f78281a0e2c6606873abae32322ae8e050.

Here comes a .deb file embedding those changes (note: had to gzip it to make GitHub accept it 🤷 ): facette_0.6.0-0~git20211005.593ce3f7_amd64.deb.gz

Once installed and the issue triggered, you should be able to visualize heap information from the running service using:

go tool pprof -http=:8080 http://your-facette-instance:12003/debug/pprof/heap

If you could extract it for me, it would be great too:

curl -s http://your-facette-instance:12003/debug/pprof/heap >facette-heap.out
akohlbecker commented 2 years ago

Hi Vincent,

thank you for the binary.

BTW: Since I've set the refresh interval for the rrd provider to 500 I no longer have problems.

In installed the debug build anyway and here is the pprof output: facette-heap.out.gz

Cheers Andreas