Closed dapplion closed 1 year ago
could give node-clinic a try, specifically node-clinic-flame
flame graph looks like this
Can it attach to a process live after N time? Docs I've seen appear to capture flamecharts on entire process lifetime
Based on the clinic flame
flags I don't see an option to do that.
Can be used programmatically as well, then could just call flame.collect
after N time, see node-clinic-flame docs.
Might have to use 0x directly to get more fine-grained control as this seems to have a --collect-delay
flag (Specify a delay(ms) before collecting data).
All of the libraries I found seem to want to run the node process directly and cannot be turned on and then off again programatically. After a bit of googling I found this blog and noted the flags. I looked at the Ox
code and found a few things to better understanding how that library works.
It looks like they are running the process here and here depending on if they are profiling v8 or linux.
For v8 they are out-putting the isolate data to a log. Then then parse the data into tick
s here to build the graph data.
For linux they are launching using a system level perf command and passing --perf-basic-prof
and then turn the trace into tick
s here
Sadly in both situations they use a flag to output perf data. I am doing a bit more digging to see how it is implemented to try and understand if/what the performance implications will be.
There are 4 perf flags that are available and the node flamegraph docs talk about a few of the details.
I am reading about the Chrome Debugging Protocol docs that were referred to in the debugger section of the node docs to see how the protocol works so we can potentially leverage the flag
Updated. I am guessing there will be some degradation so it might not be ideal for prod but will add another comment below when I get further
I researched the performance implications of the --inspect
flag and there is none when a debugger is not attached. However, when one is attached it is 100x to 300x according to this thread on SO.
This post elaborates a bit on the security risks but it also mentions that debugging can be flipped on with kill -usr1 ${PID}
but I have not tested.
During my journey I found an interesting video... https://www.youtube.com/watch?v=Xb_0awoShR8&t=570s
Updated. The speaker talks about process._debugProcess(pid)
to turn on debugging from outside of a running node instance. It is what node uses under the hood to turn on debugging incidentally. It and _debugEnd are available.
This is another highlight https://youtu.be/Xb_0awoShR8?t=682
Updated. The speaker talks about the core debugging protocol and using it for profiling. See the links below for more detail: https://nodejs.org/dist/latest-v18.x/docs/api/inspector.html#cpu-profiler https://chromedevtools.github.io/devtools-protocol/v8/Profiler/
starting to get somewhere notable I think 😄
@dapplion I also found the reference that I mentioned on standup, it is a youtube video. It wasn't a blog is why I couldn't find with a google search. It is a Netflix engineer talking about flame graphs on running node process in prod.
The speaker talks about the --perf
flags mentioned above in this video here and in particular the usage for --perf-basic-prof-only-functions
to generate the flamegraph. He mentions that it is very low impact on the running process.
They are using a linux library perf and describe how they implement here. It is the same library that Ox
is using for linux.
Netflix is using brendangregg/FlameGraph to generate the flamegraphs. Ox
is using a custom implementation that renders and bundles an html page that is in source.
As a note there was some GREAT other stuff in that video about post mortem debugging with core-dumps that was very interesting. The whole video is def worth watching.
@dapplion checkout a branch diff here that has an idea for how to generate the stack traces. I talked with @Faithtosin about strategies to collect the flamegraph data when not running locally. Please tell me what you think.
@Faithtosin was asking me about the scope for where and when you would like to run this? Is it just something contributors will want to run on the cloud nodes or should it run locally? I have had challenges running on mac. I have only tried to catch OS level and not v8 level on mac though to mimic what happens on linux
@tuyennhv captures CPU profiles regularly to ensure Lodestar performance profile is good, i.e. So we usually:
So we always run on demand on specific machines. The goal here it to make it easier so we do it more often. But I'm not sure we should bake it into production code, sounds more like a job for external tooling
It should be run on our test nodes in the cloud
@dapplion I added as many references that I could here. It has most of the breadcrumbs that I found and the libraries that I seriously considered (and looked through the code). The rest that came up by searching were either not widely used (less that 100 installs weekly) or were very old and hadn't had a commit since 2017
To understand what's affecting Lodestar performance our current strategy is to attach a chromium dev tools instance into a node running with
node --inspect
Those dev tools can render a stack chart by time, but not a regular flamechart by purely stack occurrences. The information exposed by
node --inspect
should be enough to produce a flamechart.CC: @matthewkeil @nflaig @tuyennhv