Able to do Memory profiling in production

velvia commented 2 years ago

I'd like to update with all the work that I put into this this week.

Added Narwhal memory profiling with an option to add Jemalloc profiling in Narwhal's node, and hooks to enable profiling in narwhal's benchmark. PR:

https://github.com/MystenLabs/narwhal/pull/448

PR is almost ready to be merged.

Merged in PR to use Jemalloc as the memory allocator in Sui-node, which allows us to use jeprof memory profiling in production
Worked with @tharbert to test the PR in devnet-staging.
- Enabling the profiling did work. It produced local profile files.
- There is an issue with trying to open the profiles, they can't be downloaded and opened on a Mac, because Macs can't parse Linux executables
- Somehow the Sui-node executable ballooned to 1.6GB after enabling debug symbols, which is needed for profiling. This is unexpected.

Thus there are two issues left from initial jeprof deployment.

velvia commented 2 years ago

There is also the issue that jeprof isn't the easiest profiling tool to use. There's no nice UI, there's no easy graph or ordering tree or anything to analyze things. One can create the SVG graph, but unless one works hard to simplify the data, the graph is too big and unreadable.

I also am trying out @huitseeker 's recommendation of ByteHound: https://github.com/koute/bytehound

Bytehound looks really good. It only runs on Linux, but it is able to track allocations over time, generate flame graphs of memory allocations which are super easy to interpret, including just leaked allocations. Here is one:

Note that one can easily see the crucial thing in the flame graph, sui_core::ConsensusListener::....

Still need to evaluate if this is a tool that can really be put in production, or not. That should probably be separated out to a different issue.

velvia commented 2 years ago

/cc a few others who have expressed interest... @laura-makdah @todd-mystenlabs

I just realized what would be the perfect intermediate solution, it would just take a few days to write and deploy it.

JEProf has a mode where you can manually trigger profile dumps. This is the key to improving and unlocking functionality. See this paragraph:

It is possible to start an application with profiling enabled but inactive, by specifying MALLOC_CONF=prof_active:false. This is only useful if the application manually activates/deactivates profiling via the "prof.active" mallctl during execution. Use cases include:

Activate profiling after initialization is complete, so that profiles only show objects allocated during steady-state execution. Dump a profile, activate profiling for 30 seconds, wait 30 seconds after deactivating profiling, then dump another profile and use jeprof to compare the two dumps. This will focus on objects that were allocated during steady-state execution, but are long-lived. These objects are prime candidates for explaining memory growth over time.

(from https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling)

Basically we write a custom plugin, maybe it should be a separate crate, which spins a background thread. We could use web routes to activate it, or use automatic triggers which we can control, such as when the total memory is rising above a certain rate, or hits a new high. Thus:

We can control precisely when the dumping happens, avoiding cluttering up the filesystem regularly
We can name the dumps after the current timestamp, making them much easier to parse
We can do the special mode discussed above to try to track long-lived allocations

It would take some time to test out, especially to reproduce situations when memory growth keeps happening unabated.

lxfind commented 2 years ago

debug info is making docker image very large, took out for now need to figure out a way to reenable it may also want to be able to dynamically turn it on and off

one approach: only ship debug info to one validator or: strip debug info, symbolicate traces afterwards

velvia commented 2 years ago

PR out to enable two different types of memory profiling via env vars at runtime:

https://github.com/MystenLabs/sui-operations/pull/159

This is part 1 of 2. Part 2 involves building debug images or Docker image with tools that allow for easy analysis of the profiling data.

velvia commented 2 years ago

PR is merged. Next part is building an image with debug build so that enabled profiling data can be analyzed.

velvia commented 2 years ago

Update:

Sui has a PR merged that enables building Docker images with debug symbols for profiling
PR is out for workflow to build Docker image with profiling tools:

https://github.com/MystenLabs/sui-operations/pull/182

Also, have verified that using env vars to enable profiling does work. We need to decide if we want to enable one of these types of profiling by default.

Next steps is to analyze profiling output using the debug image, that unfortunately needs a redeploy.

velvia commented 2 years ago

This is more or less done now. Validator images deployed to any environment can enable profiling via environment variables. Right now there are two options, Jemalloc and Bytehound, and neither one should be that terrible at current load levels.

The profiling data would need to be downloaded and viewed using a separate image which contains a debug version of Sui-node, plus viewers/tools for the profiling data. Right now, one has to opt-in to building this profiling image. It is a bit of a manual process. We could automate this by deploying a profiling image alongside the regular image in environments, which would enable users to point their browsers at something in a cluster to get profiling information. That can be a separate ticket.

Next steps are to reproduce memory issues using load generators, CPU profiling, and usability.

MystenLabs / sui

Able to do Memory profiling in production #2974