getsentry / sentry-go

The official Go SDK for Sentry (sentry.io)
https://docs.sentry.io/platforms/go/
MIT License
882 stars 208 forks source link

Profiling large number of goroutines #748

Open vaind opened 7 months ago

vaind commented 7 months ago

I've created a simple sample application with 5000 concurrent goroutines, tuned to have around 30 % CPU usage (on my desktop PC) without profiling. After profiling was enabled with 5 % sample rate, the CPU usage jumps to around 40 %.

I've run a pprof CPU profile with and without profiling enabled. In the following SVG, you can see that the vast majority of the overhead is coming from calling Runtime.stack(). Current profiling implementation calls it periodically to collect stacks for all goroutines. Therefore, the time complexity increases with the number of active goroutines. See the profile output as a tree (.svg so you can zoom in)

I'm investigating how we could improve this.

Note: runtime.GoroutineProfile() has lower overhead, possibly because it doesn't produce string output, but it currently doesn't have a goroutine ID so we can't map stacks across multiple calls to a single routine, see https://github.com/golang/go/issues/59663

vaind commented 5 months ago

With the current tooling we have available from Go runtime and in combination with what Sentry.io expects to receive, I don't see a way to improve profiling for apps with a very large number of goroutines. If we could make Sentry accept pprof (which only has aggregate information about samples), we could use pprof.StartCPUProfile(). This, however, would not yield the same experience as with other SDKs because we wouldn't have time-based samples anymore. A big downside to this would be that adding another format would introduce future maintenance issues when trying to keep it working alongside the "mainstream" sampling profiler.

As such, I think we may want to make some adjustments to automatically disable the profiler when it encounters a given number of goroutines. This would act as a reasonable default behaviour working around this limitation before it's resolved in one way or another. Users could be able to change this configuration or disable it completely if they want to run the profiler unhindered, regardless of the number of routines they launch.

vaind commented 1 month ago

I've made a PR in Go to land the change to GoroutineProfile() and verified the functionality in the SDK on branch https://github.com/getsentry/sentry-go/tree/feat/goroutine-profile

cleptric commented 1 month ago

Sweet!