VictorRobellini / pfSense-Dashboard

A functional and useful dashboard for pfSense that utilizes influxdb, grafana and telegraf
669 stars 184 forks source link

Question about memory usage calculation #57

Open faandg opened 2 years ago

faandg commented 2 years ago

Hi,

I recently ran into this pfsense bug after 148d uptime and my memory was full (great fun). On the pfsense main screen I saw memory usage at 99% steady and swap at 100%. On this dashboard I saw memory usage at 20% steady and swap at 100%

Can you think of any reason why the dashboard would not report the total memory usage as shown in pfsense? I read a bit about memory 'ballooning' but it seemed to me that would only be valid for a virtualized pfsense (mine is a hardware device)

wrightsonm commented 2 years ago

Here are the numbers from my system before and after stopping pscsd

image

The stats on pfsense page are currently showing (90mins later) which look consistent witg the mem and swap used_percent

image

Could it be to do with the difference between used, free and laundry?

faandg commented 2 years ago

I think you might be on to something. I'm seeing the same thing. Before: image After mitigation: image

We should investigate if we can a proper way to track laundry (and maybe other types?) in the dashboard.

faandg commented 2 years ago

Seeing high numbers of type laundry in the other bug reports as well: image I guess we want to show it separately OR include it in the calculation for total memory usage.

wrightsonm commented 2 years ago

I was trying to work out how the memory data is sent from pfsense to influxdb and failed. Any ideas where these values actually come from? I didn't see them in the telegraf config or any if the plugins

wrightsonm commented 2 years ago

@faandg the new version of pfsense was released today which should resolve the bug with pcscd.

faandg commented 2 years ago

@wrightsonm thanks, I saw it on my news feed as well. I might wait just a bit longer though because I have IPSEC tunnels and there appear to be many related changes. Maybe a couple of weeks.

VictorRobellini commented 2 years ago

Sorry for the delay. I believe memory stats get inserted from this input

Does 2.6 resolve this for you?

faandg commented 2 years ago

@VictorRobellini 2.6 resolves the bug. It's up to you to decide whether or not you want to include laundry (integer, FreeBSD) in the dashboard query. I'd open a pull request but I have zero experience writing influxql | flux queries.

In this particular case, adding laundry would show that base usage (20%) + laundry (80%) = 100 % -> memory full as opposed to not having laundry added, it would show only base usage (20%) -> looks fine when in fact it is not

Laundry is considered as 'reclaimable space'. However if it cannot be reclaimed at that time, it would keep adding up until max (which is the case when you have a memory leak).

wrightsonm commented 2 years ago
image

This flux query should do the trick:

//Pfsense < v2.6 had a bug where a a memory leak in pscsd could use all of the laundry space
from(bucket: "${bucket}")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "mem")
  |> filter(fn: (r) => r["host"] =~ /^${Host:pipe}$/)
  |> filter(fn: (r) => r["_field"] == "laundry" or r["_field"] == "total")
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({r with laundry_percent:float(v:r.laundry) / float(v:r.total) * float(v: 100) }))
  |> keep(columns:["_time","laundry_percent"])
  |> rename(columns: {laundry_percent:"Laundry Ram Used"})
wrightsonm commented 2 years ago

@faandg updated my repo with the above: https://github.com/wrightsonm/pfSense-Dashboard