MystenLabs / sui

Sui, a next-generation smart contract platform with high throughput, low latency, and an asset-oriented programming model powered by the Move programming language
https://sui.io
Apache License 2.0
6.12k stars 11.16k forks source link

Able to do Memory profiling in production #2974

Closed lxfind closed 2 years ago

velvia commented 2 years ago

I'd like to update with all the work that I put into this this week.

  1. Added Narwhal memory profiling with an option to add Jemalloc profiling in Narwhal's node, and hooks to enable profiling in narwhal's benchmark. PR:

https://github.com/MystenLabs/narwhal/pull/448

PR is almost ready to be merged.

  1. Merged in PR to use Jemalloc as the memory allocator in Sui-node, which allows us to use jeprof memory profiling in production

  2. Worked with @tharbert to test the PR in devnet-staging.

    • Enabling the profiling did work. It produced local profile files.
    • There is an issue with trying to open the profiles, they can't be downloaded and opened on a Mac, because Macs can't parse Linux executables
    • Somehow the Sui-node executable ballooned to 1.6GB after enabling debug symbols, which is needed for profiling. This is unexpected.

Thus there are two issues left from initial jeprof deployment.

velvia commented 2 years ago

There is also the issue that jeprof isn't the easiest profiling tool to use. There's no nice UI, there's no easy graph or ordering tree or anything to analyze things. One can create the SVG graph, but unless one works hard to simplify the data, the graph is too big and unreadable.

I also am trying out @huitseeker 's recommendation of ByteHound: https://github.com/koute/bytehound

Bytehound looks really good. It only runs on Linux, but it is able to track allocations over time, generate flame graphs of memory allocations which are super easy to interpret, including just leaked allocations. Here is one:

Screen Shot 2022-07-08 at 5 40 01 PM

Note that one can easily see the crucial thing in the flame graph, sui_core::ConsensusListener::....

Still need to evaluate if this is a tool that can really be put in production, or not. That should probably be separated out to a different issue.

velvia commented 2 years ago

/cc a few others who have expressed interest... @laura-makdah @todd-mystenlabs

I just realized what would be the perfect intermediate solution, it would just take a few days to write and deploy it.

JEProf has a mode where you can manually trigger profile dumps. This is the key to improving and unlocking functionality. See this paragraph:

It is possible to start an application with profiling enabled but inactive, by specifying MALLOC_CONF=prof_active:false. This is only useful if the application manually activates/deactivates profiling via the "prof.active" mallctl during execution. Use cases include:

Activate profiling after initialization is complete, so that profiles only show objects allocated during steady-state execution. Dump a profile, activate profiling for 30 seconds, wait 30 seconds after deactivating profiling, then dump another profile and use jeprof to compare the two dumps. This will focus on objects that were allocated during steady-state execution, but are long-lived. These objects are prime candidates for explaining memory growth over time.

(from https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling)

Basically we write a custom plugin, maybe it should be a separate crate, which spins a background thread. We could use web routes to activate it, or use automatic triggers which we can control, such as when the total memory is rising above a certain rate, or hits a new high. Thus:

It would take some time to test out, especially to reproduce situations when memory growth keeps happening unabated.

lxfind commented 2 years ago

debug info is making docker image very large, took out for now need to figure out a way to reenable it may also want to be able to dynamically turn it on and off

one approach: only ship debug info to one validator or: strip debug info, symbolicate traces afterwards

velvia commented 2 years ago

PR out to enable two different types of memory profiling via env vars at runtime:

https://github.com/MystenLabs/sui-operations/pull/159

This is part 1 of 2. Part 2 involves building debug images or Docker image with tools that allow for easy analysis of the profiling data.

velvia commented 2 years ago

PR is merged. Next part is building an image with debug build so that enabled profiling data can be analyzed.

velvia commented 2 years ago

Update:

https://github.com/MystenLabs/sui-operations/pull/182

Also, have verified that using env vars to enable profiling does work. We need to decide if we want to enable one of these types of profiling by default.

Next steps is to analyze profiling output using the debug image, that unfortunately needs a redeploy.

velvia commented 2 years ago

This is more or less done now. Validator images deployed to any environment can enable profiling via environment variables. Right now there are two options, Jemalloc and Bytehound, and neither one should be that terrible at current load levels.

The profiling data would need to be downloaded and viewed using a separate image which contains a debug version of Sui-node, plus viewers/tools for the profiling data. Right now, one has to opt-in to building this profiling image. It is a bit of a manual process. We could automate this by deploying a profiling image alongside the regular image in environments, which would enable users to point their browsers at something in a cluster to get profiling information. That can be a separate ticket.

Next steps are to reproduce memory issues using load generators, CPU profiling, and usability.