Closed jaffee closed 11 months ago
Thanks. This looks great.
Of course what's not great is the fact that fgprof doesn't manage to keep up the desired sampling rate 🙈. I'm curious: Would you consider execution tracing, e.g. /debug/pprof/trace?seconds=5
to be a viable alternative?
I recently implemented a tool to convert execution traces into wall clock profiles: https://github.com/felixge/traceutils#pprof
Very cool, will definitely be checking this out. Discovering fgprof was a game-changer for me, and I've recently started looking at fgtrace as well.
fgprof doesn't manage to keep up the desired sampling rate 🙈
I'm sure there are opportunities to make fgprof more efficient, but I think situations will always arise where it can't keep up. Do you expect tools like the one above to mostly replace fgprof?
fgprof's main bottleneck is the overhead of unwinding stack traces. My colleague @nsrip-dd is currently working on expanding our frame pointer contributions in go1.21 to other parts of the runtime, potentially including goroutine profiles. If that works out, fgprof overhead should be at least an order of magnitude lower in go1.22 than before.
Do you expect tools like the one above to mostly replace fgprof?
I think I'll probably extend fgprof to use runtime/trace
as a data source in the future. As a user you mostly won't notice a difference, but the worst-case overhead and accuracy will be much better. traceutils
is mostly an experimental tool I'm developing to debug execution tracing data and play around with new ideas for using it. I don't foresee it to become a popular project like fgprof.
Hopefully this is useful, it seems to work for me.
In cases where due to either an excessive number of goroutines, or excessive load on the system, we are not able to sample at the given rate, we can compute the actual rate we were able to sample and use that when exporting the profile so that timing information in the profile is more accurate and sensible.
With this code in place, a function which is on the stack for the entirety of a 27 second profile, will always display 27s when viewed, whereas previously it could display less time.