go-stack / stack

Package stack implements utilities to capture, manipulate, and format call stacks.
MIT License
395 stars 33 forks source link

Refactor use of FuncForPC to leverage new CallersFrames API. #13

Closed dkushner closed 6 years ago

dkushner commented 7 years ago

Sorry to tack this on to another pending PR, I can create a new branch with this modification cherry-picked but I feel that the other changes were necessary regardless and this kind of eliminates the need for them entirely.

I believe this is what you were looking for in #10. The performance for this took a huge hit because runtime.CallersFrames allocates way more than simply grabbing the function pointer for the PC for obvious reasons. It does simplify much of the internal PC trickery, so perhaps the trade off is worth it.

There's also the possibility that my specific implementation may be less-than optimal as I'm not claiming to be a paragon of performant code here.

name                   old time/op  new time/op   delta
CallVFmt-8              498ns ± 1%    507ns ± 2%      ~     (p=0.057 n=4+4)
CallPlusVFmt-8          650ns ± 2%    658ns ± 4%      ~     (p=0.743 n=4+4)
CallSharpVFmt-8         504ns ± 5%    518ns ± 5%      ~     (p=0.486 n=4+4)
CallSFmt-8              451ns ± 1%    475ns ± 5%    +5.32%  (p=0.029 n=4+4)
CallPlusSFmt-8          587ns ± 1%    613ns ± 2%    +4.39%  (p=0.029 n=4+4)
CallSharpSFmt-8         438ns ± 1%    455ns ± 2%    +3.94%  (p=0.029 n=4+4)
CallDFmt-8              404ns ± 1%    403ns ± 1%      ~     (p=0.314 n=4+4)
CallNFmt-8              379ns ± 1%    389ns ± 4%      ~     (p=0.200 n=4+4)
CallPlusNFmt-8          347ns ± 1%    353ns ± 1%      ~     (p=0.086 n=4+4)
Caller-8                372ns ± 0%    616ns ± 3%   +65.68%  (p=0.029 n=4+4)
Trace-8                 850ns ± 4%   2306ns ± 1%  +171.40%  (p=0.029 n=4+4)
Trace10-8              2.82µs ± 1%   7.44µs ± 7%  +164.17%  (p=0.029 n=4+4)
Trace50-8              4.52µs ± 1%  16.28µs ± 1%  +260.27%  (p=0.029 n=4+4)
Trace100-8             7.83µs ± 1%  30.05µs ± 6%  +283.89%  (p=0.029 n=4+4)
CallerAndVFmt-8         918ns ± 2%   1206ns ± 7%   +31.25%  (p=0.029 n=4+4)
TraceAndVFmt-8         2.65µs ± 4%   4.19µs ± 4%   +58.23%  (p=0.029 n=4+4)
Trace10AndVFmt-8       9.38µs ± 1%  13.50µs ± 3%   +43.89%  (p=0.029 n=4+4)
RuntimeCaller-8         391ns ± 1%    404ns ± 6%      ~     (p=0.314 n=4+4)
RuntimeCallerAndFmt-8   714ns ± 2%    751ns ± 7%      ~     (p=0.114 n=4+4)
FuncForPC-8            9.27ns ± 1%  10.47ns ± 6%   +12.97%  (p=0.029 n=4+4)
FuncFileLine-8          123ns ± 1%    129ns ± 7%      ~     (p=0.086 n=4+4)
coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.2%) to 89.418% when pulling 8c987d41f6f04d91c8947ea67fd4862ee3da2cc7 on dkushner:feature/callers-frames into 7a2f19628aabfe68f0766b59e74d6315f8347d22 on go-stack:master.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.2%) to 89.418% when pulling c3e3c2362991d3271528ecc1e716c278a80454bb on dkushner:feature/callers-frames into 7a2f19628aabfe68f0766b59e74d6315f8347d22 on go-stack:master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.3%) to 88.889% when pulling ece85f8961c0d77848618bd6920c6a161e2f5d62 on dkushner:feature/callers-frames into 7a2f19628aabfe68f0766b59e74d6315f8347d22 on go-stack:master.

dkushner commented 7 years ago

Here is the new benchstat output after making the performance improvements we spoke about:

name                   old time/op  new time/op  delta
CallVFmt-8              560ns ± 2%   645ns ± 4%  +15.35%  (p=0.016 n=4+5)
CallPlusVFmt-8          712ns ± 2%   747ns ± 5%     ~     (p=0.190 n=4+5)
CallSharpVFmt-8         553ns ± 5%   644ns ± 6%  +16.54%  (p=0.016 n=4+5)
CallSFmt-8              495ns ± 3%   583ns ± 4%  +17.86%  (p=0.016 n=4+5)
CallPlusSFmt-8          650ns ± 2%   691ns ± 5%     ~     (p=0.063 n=4+5)
CallSharpSFmt-8         478ns ± 1%   601ns ± 4%  +25.56%  (p=0.016 n=4+5)
CallDFmt-8              428ns ± 2%   544ns ± 5%  +27.06%  (p=0.016 n=4+5)
CallNFmt-8              404ns ± 2%   627ns ± 5%  +55.10%  (p=0.016 n=4+5)
CallPlusNFmt-8          375ns ± 2%   616ns ± 9%  +64.21%  (p=0.016 n=4+5)
Caller-8                660ns ± 3%   418ns ± 7%  -36.69%  (p=0.016 n=4+5)
Trace-8                2.65µs ±24%  0.81µs ± 7%  -69.25%  (p=0.016 n=4+5)
Trace10-8              7.71µs ± 7%  2.84µs ± 7%  -63.24%  (p=0.016 n=4+5)
Trace50-8              17.5µs ± 2%   4.1µs ± 5%  -76.75%  (p=0.016 n=4+5)
Trace100-8             32.2µs ± 2%   7.0µs ± 4%  -78.19%  (p=0.016 n=4+5)
CallerAndVFmt-8        1.29µs ± 2%  1.11µs ± 3%  -13.88%  (p=0.016 n=4+5)
TraceAndVFmt-8         4.60µs ±10%  3.21µs ± 3%  -30.27%  (p=0.016 n=4+5)
Trace10AndVFmt-8       14.5µs ± 5%  11.3µs ±10%  -21.86%  (p=0.016 n=4+5)
RuntimeCaller-8         432ns ± 2%   434ns ± 4%     ~     (p=0.921 n=4+5)
RuntimeCallerAndFmt-8   782ns ± 3%   794ns ± 1%     ~     (p=0.238 n=4+5)
FuncForPC-8            11.2ns ± 4%  10.3ns ± 2%   -7.26%  (p=0.016 n=4+5)
FuncFileLine-8          137ns ± 2%   136ns ± 1%     ~     (p=0.444 n=4+5)

Note that this is against the previous, non-optimized version and not the original version that used FuncForPC.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+2.01%) to 91.228% when pulling 5884afaf2218fca698778433dd34d7a3edbb1325 on dkushner:feature/callers-frames into c03b81b422d7169e7af59c08b4913d9d8a757590 on go-stack:develop.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-1.7%) to 87.467% when pulling 74cdfb9bbff7a42c75225a2b07d98ee990b0f3e3 on dkushner:feature/callers-frames into c03b81b422d7169e7af59c08b4913d9d8a757590 on go-stack:develop.

ChrisHines commented 7 years ago

This looks good in general, but I'm not ready to merge it yet. It may still need some tweaks, but it will require some time to think through the best next step.

I ran a benchmark comparison of this PR against v1.5.3 and got the numbers shown below. It looks like the overall work is roughly the same; it has just been shifted from the initial capture towards the formatting methods, which is actually a good thing. So performance is fine now.

Digging through the new code, though, I'm a little nervous about how it is using CallersFrames. I don't think the new code handles the sigpanic special case properly because it doesn't always use runtime.Frames.Next to traverse the entry prior to the one of interest. Take a look at the code for runtime.Frames and runtime.Frames.Next for the details about how that works.

I also just reviewed https://github.com/golang/go/issues/19348, which is the discussion about how Go will implement mid-stack inlining while preserving stack traces. The changes to runtime.Callers and runtime.CallersFrames will change somewhat at that point and this package needs to work with those changes, but I don't understand all the implications yet.

Benchmarks vs. v1.5.3

name                   old time/op    new time/op    delta
CallVFmt-8                429ns ± 2%     518ns ± 5%  +20.54%          (p=0.008 n=5+5)
CallPlusVFmt-8            551ns ± 1%     632ns ± 3%  +14.75%          (p=0.008 n=5+5)
CallSharpVFmt-8           410ns ± 0%     539ns ± 6%  +31.64%          (p=0.016 n=4+5)
CallSFmt-8                382ns ± 2%     493ns ± 6%  +28.87%          (p=0.008 n=5+5)
CallPlusSFmt-8            499ns ± 2%     595ns ± 2%  +19.33%          (p=0.008 n=5+5)
CallSharpSFmt-8           367ns ± 1%     493ns ± 2%  +34.24%          (p=0.008 n=5+5)
CallDFmt-8                336ns ± 1%     444ns ± 5%  +32.24%          (p=0.008 n=5+5)
CallNFmt-8                321ns ± 1%     519ns ± 3%  +61.88%          (p=0.008 n=5+5)
CallPlusNFmt-8            295ns ± 1%     499ns ± 1%  +69.29%          (p=0.008 n=5+5)
Caller-8                  307ns ± 4%     332ns ± 8%   +8.07%          (p=0.016 n=5+5)
Trace-8                   701ns ± 3%     700ns ± 4%     ~             (p=0.690 n=5+5)
Trace10-8                2.44µs ± 9%    2.08µs ± 2%  -14.66%          (p=0.008 n=5+5)
Trace50-8                4.18µs ± 4%    3.17µs ± 2%  -23.98%          (p=0.008 n=5+5)
Trace100-8               6.97µs ± 3%    5.58µs ± 1%  -19.94%          (p=0.008 n=5+5)
CallerAndVFmt-8           886ns ± 3%     878ns ± 1%     ~             (p=0.841 n=5+5)
TraceAndVFmt-8           2.30µs ± 6%    2.53µs ± 2%   +9.73%          (p=0.008 n=5+5)
Trace10AndVFmt-8         8.12µs ± 2%    9.34µs ± 3%  +15.10%          (p=0.008 n=5+5)
RuntimeCaller-8           353ns ± 1%     354ns ± 7%     ~             (p=0.413 n=5+5)
RuntimeCallerAndFmt-8     656ns ± 1%     682ns ± 7%     ~             (p=0.151 n=5+5)
FuncForPC-8              9.07ns ± 0%    8.45ns ± 6%   -6.77%          (p=0.016 n=4+5)
FuncFileLine-8            115ns ± 1%     111ns ± 2%   -3.31%          (p=0.008 n=5+5)

name                   old alloc/op   new alloc/op   delta
CallVFmt-8                40.0B ± 0%     40.0B ± 0%     ~     (all samples are equal)
CallPlusVFmt-8            72.0B ± 0%     72.0B ± 0%     ~     (all samples are equal)
CallSharpVFmt-8           88.0B ± 0%     88.0B ± 0%     ~     (all samples are equal)
CallSFmt-8                32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
CallPlusSFmt-8            64.0B ± 0%     64.0B ± 0%     ~     (all samples are equal)
CallSharpSFmt-8           80.0B ± 0%     80.0B ± 0%     ~     (all samples are equal)
CallDFmt-8                24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
CallNFmt-8                48.0B ± 0%     48.0B ± 0%     ~     (all samples are equal)
CallPlusNFmt-8            80.0B ± 0%     80.0B ± 0%     ~     (all samples are equal)
Caller-8                 0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
Trace-8                   64.0B ± 0%     32.0B ± 0%  -50.00%          (p=0.008 n=5+5)
Trace10-8                  240B ± 0%      128B ± 0%  -46.67%          (p=0.008 n=5+5)
Trace50-8                  896B ± 0%      448B ± 0%  -50.00%          (p=0.008 n=5+5)
Trace100-8               1.79kB ± 0%    0.90kB ± 0%  -50.00%          (p=0.008 n=5+5)
CallerAndVFmt-8           40.0B ± 0%     40.0B ± 0%     ~     (all samples are equal)
TraceAndVFmt-8             192B ± 0%      192B ± 0%     ~     (all samples are equal)
Trace10AndVFmt-8           640B ± 0%      640B ± 0%     ~     (all samples are equal)
RuntimeCaller-8          0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
RuntimeCallerAndFmt-8     40.0B ± 0%     40.0B ± 0%     ~     (all samples are equal)
FuncForPC-8              0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
FuncFileLine-8           0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)

name                   old allocs/op  new allocs/op  delta
CallVFmt-8                 3.00 ± 0%      4.00 ± 0%  +33.33%          (p=0.008 n=5+5)
CallPlusVFmt-8             3.00 ± 0%      4.00 ± 0%  +33.33%          (p=0.008 n=5+5)
CallSharpVFmt-8            3.00 ± 0%      4.00 ± 0%  +33.33%          (p=0.008 n=5+5)
CallSFmt-8                 2.00 ± 0%      3.00 ± 0%  +50.00%          (p=0.008 n=5+5)
CallPlusSFmt-8             2.00 ± 0%      3.00 ± 0%  +50.00%          (p=0.008 n=5+5)
CallSharpSFmt-8            2.00 ± 0%      3.00 ± 0%  +50.00%          (p=0.008 n=5+5)
CallDFmt-8                 2.00 ± 0%      3.00 ± 0%  +50.00%          (p=0.008 n=5+5)
CallNFmt-8                 2.00 ± 0%      3.00 ± 0%  +50.00%          (p=0.008 n=5+5)
CallPlusNFmt-8             2.00 ± 0%      3.00 ± 0%  +50.00%          (p=0.008 n=5+5)
Caller-8                  0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
Trace-8                    1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
Trace10-8                  1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
Trace50-8                  1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
Trace100-8                 1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
CallerAndVFmt-8            3.00 ± 0%      4.00 ± 0%  +33.33%          (p=0.008 n=5+5)
TraceAndVFmt-8             10.0 ± 0%      14.0 ± 0%  +40.00%          (p=0.008 n=5+5)
Trace10AndVFmt-8           32.0 ± 0%      47.0 ± 0%  +46.88%          (p=0.008 n=5+5)
RuntimeCaller-8           0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
RuntimeCallerAndFmt-8      3.00 ± 0%      3.00 ± 0%     ~     (all samples are equal)
FuncForPC-8               0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
FuncFileLine-8            0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.7%) to 88.52% when pulling 745136c2f00a05e080dd40725131a31599f56167 on dkushner:feature/callers-frames into c03b81b422d7169e7af59c08b4913d9d8a757590 on go-stack:develop.

dkushner commented 7 years ago

Bench stats against v1.5.3 as of latest fixes to the PC offset required during panic stacks.

name                   old time/op  new time/op   delta
CallVFmt-8              549ns ± 6%    881ns ± 3%   +60.46%  (p=0.000 n=9+10)
CallPlusVFmt-8          705ns ± 2%    994ns ± 4%   +40.91%  (p=0.000 n=9+9)
CallSharpVFmt-8         539ns ± 3%    891ns ± 8%   +65.33%  (p=0.000 n=10+10)
CallSFmt-8              481ns ± 4%    828ns ± 4%   +72.19%  (p=0.000 n=10+10)
CallPlusSFmt-8          641ns ± 1%    937ns ± 6%   +46.26%  (p=0.000 n=8+10)
CallSharpSFmt-8         482ns ± 1%    847ns ±12%   +75.66%  (p=0.000 n=8+10)
CallDFmt-8              432ns ± 5%    788ns ± 8%   +82.30%  (p=0.000 n=10+10)
CallNFmt-8              408ns ± 5%    884ns ±11%  +116.56%  (p=0.000 n=10+9)
CallPlusNFmt-8          374ns ± 3%    850ns ± 9%  +127.35%  (p=0.000 n=10+10)
Caller-8                412ns ± 2%    391ns ± 6%    -5.05%  (p=0.000 n=9+10)
Trace-8                 931ns ± 1%    833ns ± 8%   -10.52%  (p=0.000 n=8+10)
Trace10-8              2.92µs ± 4%   2.73µs ± 7%    -6.49%  (p=0.000 n=10+10)
Trace50-8              4.88µs ± 5%   4.34µs ± 7%   -11.11%  (p=0.000 n=10+10)
Trace100-8             8.45µs ± 4%   7.77µs ± 9%    -8.09%  (p=0.000 n=10+10)
CallerAndVFmt-8        1.00µs ± 4%   1.38µs ± 2%   +37.16%  (p=0.000 n=10+9)
TraceAndVFmt-8         2.86µs ± 3%   4.28µs ± 3%   +49.31%  (p=0.000 n=10+8)
Trace10AndVFmt-8       9.68µs ± 3%  14.79µs ± 3%   +52.75%  (p=0.000 n=10+10)
RuntimeCaller-8         432ns ± 3%    422ns ± 4%    -2.27%  (p=0.037 n=9+9)
RuntimeCallerAndFmt-8   794ns ± 2%    774ns ± 5%      ~     (p=0.123 n=10+10)
FuncForPC-8            10.3ns ± 3%   11.1ns ± 8%    +7.77%  (p=0.001 n=9+10)
FuncFileLine-8          136ns ± 5%    133ns ±10%      ~     (p=0.097 n=9+10)
ChrisHines commented 7 years ago

@dkushner, checking in to see if you plan to continue working on this PR? I can take it over if you'd prefer, so don't feel obligated, but I also want to let you finish it if you still wanted to.

dkushner commented 7 years ago

@ChrisHines, I am indeed! I've been swept up in the transition to a new job and haven't had time to sit down and address this but I'll be able to get to it shortly.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 89.216% when pulling 817915b46b97fd7bb80e8ab6b69f01a53ac3eebf on dkushner:feature/callers-frames into 44dafbc221f6841651265ef20270bae8c7f94b47 on go-stack:develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+1.6%) to 90.789% when pulling 39f6a49178f5365f09d016c62df452118915eb37 on dkushner:feature/callers-frames into 44dafbc221f6841651265ef20270bae8c7f94b47 on go-stack:develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.7%) to 89.912% when pulling 9aa07130e721ae1f2d2460d5d1e956c34bfd4b7d on dkushner:feature/callers-frames into 44dafbc221f6841651265ef20270bae8c7f94b47 on go-stack:develop.

dkushner commented 7 years ago

Runtime and allocation benchmarks for the newest commits. I kind of decimated the previous changes by doing something very very stupid trying to resync my local.

benchmark                          old ns/op     new ns/op     delta
BenchmarkCallVFmt-8                379           689           +81.79%
BenchmarkCallPlusVFmt-8            530           776           +46.42%
BenchmarkCallSharpVFmt-8           407           689           +69.29%
BenchmarkCallSFmt-8                350           630           +80.00%
BenchmarkCallPlusSFmt-8            472           712           +50.85%
BenchmarkCallSharpSFmt-8           344           622           +80.81%
BenchmarkCallDFmt-8                312           599           +91.99%
BenchmarkCallNFmt-8                287           647           +125.44%
BenchmarkCallPlusNFmt-8            264           622           +135.61%
BenchmarkCaller-8                  339           288           -15.04%
BenchmarkTrace-8                   741           626           -15.52%
BenchmarkTrace10-8                 2601          1982          -23.80%
BenchmarkTrace50-8                 5147          3343          -35.05%
BenchmarkTrace100-8                9201          5459          -40.67%
BenchmarkCallerAndVFmt-8           806           1048          +30.02%
BenchmarkTraceAndVFmt-8            2321          4005          +72.55%
BenchmarkTrace10AndVFmt-8          7824          12383         +58.27%
BenchmarkRuntimeCaller-8           325           327           +0.62%
BenchmarkRuntimeCallerAndFmt-8     625           621           -0.64%
BenchmarkFuncForPC-8               7.50          7.04          -6.13%
BenchmarkFuncFileLine-8            101           101           +0.00%

benchmark                          old allocs     new allocs     delta
BenchmarkCallVFmt-8                3              4              +33.33%
BenchmarkCallPlusVFmt-8            3              4              +33.33%
BenchmarkCallSharpVFmt-8           3              4              +33.33%
BenchmarkCallSFmt-8                2              3              +50.00%
BenchmarkCallPlusSFmt-8            2              3              +50.00%
BenchmarkCallSharpSFmt-8           2              3              +50.00%
BenchmarkCallDFmt-8                2              3              +50.00%
BenchmarkCallNFmt-8                2              3              +50.00%
BenchmarkCallPlusNFmt-8            2              3              +50.00%
BenchmarkCaller-8                  0              0              +0.00%
BenchmarkTrace-8                   1              1              +0.00%
BenchmarkTrace10-8                 1              1              +0.00%
BenchmarkTrace50-8                 1              1              +0.00%
BenchmarkTrace100-8                1              1              +0.00%
BenchmarkCallerAndVFmt-8           3              4              +33.33%
BenchmarkTraceAndVFmt-8            10             15             +50.00%
BenchmarkTrace10AndVFmt-8          32             38             +18.75%
BenchmarkRuntimeCaller-8           0              0              +0.00%
BenchmarkRuntimeCallerAndFmt-8     3              3              +0.00%
BenchmarkFuncForPC-8               0              0              +0.00%
BenchmarkFuncFileLine-8            0              0              +0.00%

benchmark                          old bytes     new bytes     delta
BenchmarkCallVFmt-8                40            56            +40.00%
BenchmarkCallPlusVFmt-8            72            88            +22.22%
BenchmarkCallSharpVFmt-8           104           120           +15.38%
BenchmarkCallSFmt-8                32            48            +50.00%
BenchmarkCallPlusSFmt-8            64            80            +25.00%
BenchmarkCallSharpSFmt-8           96            112           +16.67%
BenchmarkCallDFmt-8                24            40            +66.67%
BenchmarkCallNFmt-8                48            64            +33.33%
BenchmarkCallPlusNFmt-8            80            96            +20.00%
BenchmarkCaller-8                  0             0             +0.00%
BenchmarkTrace-8                   64            80            +25.00%
BenchmarkTrace10-8                 240           256           +6.67%
BenchmarkTrace50-8                 896           896           +0.00%
BenchmarkTrace100-8                1792          1792          +0.00%
BenchmarkCallerAndVFmt-8           40            56            +40.00%
BenchmarkTraceAndVFmt-8            192           456           +137.50%
BenchmarkTrace10AndVFmt-8          640           1152          +80.00%
BenchmarkRuntimeCaller-8           0             0             +0.00%
BenchmarkRuntimeCallerAndFmt-8     40            40            +0.00%
BenchmarkFuncForPC-8               0             0             +0.00%
BenchmarkFuncFileLine-8            0             0             +0.00%
ChrisHines commented 7 years ago

@dkushner Also, please update the .travis.yml file to run tests for Go 1.7 and 1.8.

ChrisHines commented 6 years ago

Superceded by #19.