Closed whitslack closed 2 years ago
I also am experiencing memory leak ish instability after upgrading to 10.1 that has led me to disable the plugin I am using (clboss) for now.
I think this is a false positive: you're seeing the mmap of the gossip store?
If it grows significantly over time, that's an issue...
@rustyrussell: Memory-mapped gossip store pages wouldn't be accounted in VmSwap
. Since they're file-backed, they'd simply be evicted from RAM, to be re-fetched from file as needed. Only anonymous pages (and COW'd private copies of file-backed pages) go into VmSwap
.
Hmm, usage here is much lighter, but jumped 200M the first time I called (lightning-cli listnodes; lightning-cli listchannels; lightning-cli listincoming) > /dev/null.
(Then nothing moved it again). I think I know what it must be, let me see if I'm right...
A possibly helpful piece of information: I was able to ramp topology up to 2.5 GB by lots of calls to listchannels
specifying a SCID. In fact, at one point due to the performance regression, I had several processes all hammering on listchannels <scid>
concurrently and continuously for many hours. (I rewrote that script too so that it doesn't call listchannels
at all anymore.)
Thanks for the great report! Indeed, I found one (not where I was expecting, in fact). I'm testing it on my node now...
This seems to help, however, I still get significant growth:
$ grep -E '^(Vm|Rss)' "/proc/$(pidof topology)/status"
VmPeak: 166344 kB
VmSize: 124544 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 150084 kB
VmRSS: 70600 kB
RssAnon: 59944 kB
RssFile: 10656 kB
RssShmem: 0 kB
VmData: 60316 kB
VmStk: 132 kB
VmExe: 1100 kB
VmLib: 3864 kB
VmPTE: 292 kB
VmSwap: 0 kB
Then I run listchannels: lightning-cli listchannels > /dev/null
And now:
$ grep -E '^(Vm|Rss)' "/proc/$(pidof topology)/status"
VmPeak: 418140 kB
VmSize: 345944 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 390976 kB
VmRSS: 336624 kB
RssAnon: 281524 kB
RssFile: 55100 kB
RssShmem: 0 kB
VmData: 281684 kB
VmStk: 132 kB
VmExe: 1100 kB
VmLib: 3864 kB
VmPTE: 724 kB
VmSwap: 0 kB
Running it multiple times doesn't make it worse, but running malloc_trim(0) does make it return an awful lot of RAM to the system (until I run listchannels again):
$ grep -E '^(Vm|Rss)' "/proc/$(pidof topology)/status"
VmPeak: 418140 kB
VmSize: 123820 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 390976 kB
VmRSS: 96120 kB
RssAnon: 59512 kB
RssFile: 36608 kB
RssShmem: 0 kB
VmData: 59552 kB
VmStk: 132 kB
VmExe: 1100 kB
VmLib: 3864 kB
VmPTE: 288 kB
VmSwap: 0 kB
rusty@ubuntu-1gb-sgp1-01:~/lightning$ lightning-cli listchannels > /dev/null
rusty@ubuntu-1gb-sgp1-01:~/lightning$ grep -E '^(Vm|Rss)' "/proc/$(pidof topology)/status"
VmPeak: 418152 kB
VmSize: 345956 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 390980 kB
VmRSS: 336644 kB
RssAnon: 281536 kB
RssFile: 55108 kB
RssShmem: 0 kB
VmData: 281684 kB
VmStk: 132 kB
VmExe: 1100 kB
VmLib: 3864 kB
VmPTE: 724 kB
VmSwap: 0 kB
@rustyrussell: What in the world are you allocating that's so huge? Is your JSON parser that inefficient? :grimacing:
It's 77MB of JSON. But let me run massif and see what the rest of the RAM is for!
Huh, weird. On my laptop, topology after listchannels (on regtest, developer mode, importing gossip_store) gives a much more expected result:
Peak: 195772 kB
VmSize: 123576 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 175864 kB
VmRSS: 121528 kB
RssAnon: 70628 kB
RssFile: 50900 kB
RssShmem: 0 kB
VmData: 70656 kB
VmStk: 136 kB
VmExe: 720 kB
VmLib: 2664 kB
VmPTE: 276 kB
VmSwap: 0 kB
And massif shows nothing surprising. OK, let me try running massif on my actual live machine...
Nope, massif on my actual machine shows the same thing: we peak at 130MB, as expected. glibc's allocator hates us?
Issue and Steps to Reproduce
Is it expected that the
topology
plugin process should be using multiple gigabytes of RAM?Almost all of the process's memory is swapped out, suggesting that the process is not actively referencing those pages.
Immediately after a call to
lightning-cli listnodes; lightning-cli listchannels; lightning-cli listincoming
, the process has only swapped back in a small (61MiB) portion of its huge (2583MiB) memory footprint.I suspect a memory leak. What would be the best way of determining if that is indeed occurring?
getinfo
outputThis is the release version 0.10.1.