buildbarn / bb-clientd

Buildbarn client-side FUSE/NFSv4 daemon
Apache License 2.0
39 stars 11 forks source link

bb-clientd uses too much memory and gets killed #8

Closed jmmv closed 3 months ago

jmmv commented 1 year ago

I have configured Bazel to use bb-clientd to manage the output tree. With that, I was running a bazel test command with about ~2000 test targets and a --jobs=200 value, the vast majority of which currently fail. The fact that they mostly fail may be relevant here, but I'm not sure.

Along the way, bb-clientd was killed by the Linux OOM killer, thus causing the test run to fail (due to Transport endpoint not connected messages and the like). I noticed that bb-clientd would go up to about ~13GB of RAM before being killed and consumed 20-50% CPU all along. I grabbed a heap profile of the process when it was at about ~10GB of RAM and attached it below. I did not see bb-clientd's memory consumption go down, so this might be a memory leak or a failure of the daemon or FUSE to reclaim vnodes.

Could you take a look? Thanks!

profile001.pdf

EdSchouten commented 1 year ago

What happens if you pass in --nobuild_runfile_links?

https://bazel.build/reference/command-line-reference#flag--build_runfile_links

The issue is that bb_clientd currently just keeps track of its contents in memory. This is generally fine, as long as you don’t instruct Bazel to create runfiles directories for each target.

Doing so is recommended anyway, as enabling that feature causes a quadratic explosion in inodes.

jmmv commented 1 year ago

Interesting. Thanks for the suggestion. I tried with the flag and it did prevent the OOMs, but I still saw bb-clientd growing up to about 8GBs... not sure if that's expected as well.

EdSchouten commented 1 year ago

Just for the record: how did you measure the 8 GB memory usage? Through go tool pprof or the likes? Do keep in mind that bb_clientd also mmaps its cache, so that can contribute to a large virtual size.

jmmv commented 1 year ago

I was just glancing at the RSS column of top (on Linux) while the tests were running.

EdSchouten commented 1 year ago

Yeah, that tends to over-approximate actual memory use. It's best to take a look at the memory stats shown in pprof.