capnproto / go-capnp

Cap'n Proto library and code generator for Go
https://capnproto.org
Other
1.22k stars 110 forks source link

capnp: Refactor Arenas #586

Closed matheusd closed 2 months ago

matheusd commented 2 months ago

This PR performs the first refactor of the Arena interface and its existing implementations.

The ultimate goal for this and following arena refactor PRs is to make the API saner, more performant and allow for third party creation of Arena implementations (which is not possible today due to need to access private fields within Segment).

This PR attempts to maintain as much as possible the interface and implied semantics of the Arena interface, while still improving the overall API and performance.

I have attempted to (as much as feasible) not break any of the current assumptions that external code may make about the behavior of the existing Arena implementations. This has proven hard, due to the high coupling between the implementations and the Message and Segment types (which is partially unwound here). This can be asserted by reviewers by noting how few of the existing tests had to be fixed.

In the future, I hope to be more aggressive in the changes proposed, thus this initial PR is meant to serve as base, upon which more significant changes will be made.

Benchstat comparison before/after refactor Note: the benchmarks "old" benchmarks were run on the commit titled "Fix reuse of SingleSegmentArena" in order for the fixes in that commit to apply both before and after the refactor. ``` name old time/op new time/op delta MessageGetFirstSegment-7 363ns ± 2% 205ns ± 1% -43.35% (p=0.000 n=10+10) TextMovementBetweenSegments-7 434µs ± 1% 443µs ± 2% +2.01% (p=0.000 n=10+10) Marshal-7 1.44µs ± 1% 1.23µs ± 1% -14.71% (p=0.000 n=10+9) Marshal_ReuseMsg-7 894ns ± 1% 702ns ± 2% -21.51% (p=0.000 n=10+10) Unmarshal-7 947ns ± 1% 855ns ± 1% -9.77% (p=0.000 n=8+10) Unmarshal_Reuse-7 589ns ± 2% 557ns ± 1% -5.34% (p=0.000 n=10+9) Decode-7 8.83ms ± 1% 7.58ms ± 0% -14.11% (p=0.000 n=10+10) Growth/SingleSegment/release=false-7 10.4ms ± 3% 10.8ms ± 2% +3.44% (p=0.000 n=9+10) Growth/MultiSegment/release=false-7 13.2ms ± 1% 10.3ms ± 2% -21.96% (p=0.000 n=10+10) Growth/SingleSegment/release=true-7 10.7ms ± 4% 10.8ms ± 1% ~ (p=0.631 n=10+10) Growth/MultiSegment/release=true-7 12.8ms ± 1% 10.2ms ± 2% -20.39% (p=0.000 n=10+10) SmallMessage/SingleSegment/release=false-7 1.34µs ± 1% 1.13µs ± 1% -15.58% (p=0.000 n=10+10) SmallMessage/MultiSegment/release=false-7 1.53µs ± 3% 1.20µs ± 1% -21.60% (p=0.000 n=9+9) SmallMessage/SingleSegment/release=true-7 1.08µs ± 1% 0.67µs ± 1% -38.45% (p=0.000 n=9+10) SmallMessage/MultiSegment/release=true-7 1.01µs ± 1% 0.63µs ± 2% -38.00% (p=0.000 n=8+10) Pool-7 64.5ns ± 1% 64.7ns ± 2% ~ (p=0.491 n=10+10) Pack-7 13.8µs ± 3% 13.8µs ± 1% ~ (p=0.239 n=10+10) Unpack-7 10.9µs ± 1% 10.9µs ± 1% ~ (p=0.853 n=10+10) Unpack_Large-7 2.06µs ± 2% 2.06µs ± 1% ~ (p=0.542 n=10+10) Reader-7 22.9µs ± 2% 23.0µs ± 2% ~ (p=0.089 n=10+10) Reader_Large-7 30.7µs ± 1% 30.2µs ± 2% -1.45% (p=0.002 n=10+10) Extract-7 21.1µs ± 2% 20.9µs ± 1% -1.09% (p=0.001 n=10+10) Insert-7 21.0µs ± 1% 21.1µs ± 1% ~ (p=0.143 n=10+10) PingPong-7 46.8µs ± 2% 46.1µs ± 1% -1.37% (p=0.004 n=10+8) name old alloc/op new alloc/op delta MessageGetFirstSegment-7 24.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) Marshal-7 1.50kB ± 0% 1.32kB ± 0% -12.21% (p=0.000 n=10+10) Marshal_ReuseMsg-7 120B ± 0% 96B ± 0% -20.00% (p=0.000 n=10+10) Unmarshal-7 592B ± 0% 336B ± 0% -43.24% (p=0.000 n=10+10) Unmarshal_Reuse-7 0.00B 0.00B ~ (all equal) Decode-7 7.70MB ± 0% 5.17MB ± 0% -32.95% (p=0.000 n=10+10) Growth/SingleSegment/release=false-7 4.35MB ± 0% 4.35MB ± 0% +0.02% (p=0.000 n=10+9) Growth/MultiSegment/release=false-7 1.59MB ± 0% 1.59MB ± 0% -0.04% (p=0.000 n=10+9) Growth/SingleSegment/release=true-7 4.35MB ± 0% 4.35MB ± 0% +0.02% (p=0.000 n=10+10) Growth/MultiSegment/release=true-7 537kB ± 1% 535kB ± 1% -0.32% (p=0.019 n=10+10) SmallMessage/SingleSegment/release=false-7 1.41kB ± 0% 1.22kB ± 0% -13.30% (p=0.000 n=10+10) SmallMessage/MultiSegment/release=false-7 1.82kB ± 0% 1.42kB ± 0% -22.36% (p=0.000 n=8+10) SmallMessage/SingleSegment/release=true-7 328B ± 0% 80B ± 0% -75.61% (p=0.000 n=10+10) SmallMessage/MultiSegment/release=true-7 304B ± 0% 80B ± 0% -73.68% (p=0.000 n=10+10) Pool-7 0.00B 0.00B ~ (all equal) Extract-7 16.2kB ± 0% 15.7kB ± 0% -3.16% (p=0.000 n=10+10) Insert-7 15.8kB ± 0% 15.5kB ± 0% -1.76% (p=0.000 n=10+10) name old allocs/op new allocs/op delta MessageGetFirstSegment-7 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) Marshal-7 7.00 ± 0% 5.00 ± 0% -28.57% (p=0.000 n=10+10) Marshal_ReuseMsg-7 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Unmarshal-7 3.00 ± 0% 3.00 ± 0% ~ (all equal) Unmarshal_Reuse-7 0.00 0.00 ~ (all equal) Decode-7 50.1k ± 0% 50.1k ± 0% -0.06% (p=0.000 n=10+10) Growth/SingleSegment/release=false-7 19.4 ± 7% 19.7 ± 4% ~ (p=0.666 n=10+10) Growth/MultiSegment/release=false-7 26.0 ± 0% 20.0 ± 0% -23.08% (p=0.000 n=10+10) Growth/SingleSegment/release=true-7 19.4 ± 7% 19.0 ± 0% ~ (p=0.092 n=10+9) Growth/MultiSegment/release=true-7 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=9+10) SmallMessage/SingleSegment/release=false-7 6.00 ± 0% 4.00 ± 0% -33.33% (p=0.000 n=10+10) SmallMessage/MultiSegment/release=false-7 7.00 ± 0% 5.00 ± 0% -28.57% (p=0.000 n=10+10) SmallMessage/SingleSegment/release=true-7 4.00 ± 0% 1.00 ± 0% -75.00% (p=0.000 n=10+10) SmallMessage/MultiSegment/release=true-7 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10) Pool-7 0.00 0.00 ~ (all equal) Extract-7 32.0 ± 0% 32.0 ± 0% ~ (all equal) Insert-7 29.0 ± 0% 29.0 ± 0% ~ (all equal) name old speed new speed delta Decode-7 109MB/s ± 1% 127MB/s ± 0% +16.43% (p=0.000 n=10+10) Growth/SingleSegment/release=false-7 101MB/s ± 4% 97MB/s ± 2% -3.36% (p=0.000 n=9+10) Growth/MultiSegment/release=false-7 79.2MB/s ± 1% 101.4MB/s ± 2% +28.14% (p=0.000 n=10+10) Growth/SingleSegment/release=true-7 98.0MB/s ± 4% 97.0MB/s ± 1% ~ (p=0.644 n=10+10) Growth/MultiSegment/release=true-7 81.8MB/s ± 1% 102.7MB/s ± 2% +25.62% (p=0.000 n=10+10) SmallMessage/SingleSegment/release=false-7 53.7MB/s ± 1% 63.6MB/s ± 1% +18.47% (p=0.000 n=10+10) SmallMessage/MultiSegment/release=false-7 46.8MB/s ± 4% 59.9MB/s ± 1% +28.06% (p=0.000 n=10+9) SmallMessage/SingleSegment/release=true-7 66.4MB/s ± 1% 107.9MB/s ± 1% +62.48% (p=0.000 n=9+10) SmallMessage/MultiSegment/release=true-7 71.1MB/s ± 1% 114.6MB/s ± 1% +61.09% (p=0.000 n=9+10) Pack-7 742MB/s ± 3% 740MB/s ± 1% ~ (p=0.239 n=10+10) Unpack-7 938MB/s ± 1% 939MB/s ± 1% ~ (p=0.853 n=10+10) Unpack_Large-7 25.4GB/s ± 2% 25.4GB/s ± 1% ~ (p=0.529 n=10+10) Reader-7 448MB/s ± 2% 444MB/s ± 2% ~ (p=0.089 n=10+10) Reader_Large-7 1.71GB/s ± 1% 1.73GB/s ± 2% +1.47% (p=0.002 n=10+10) ```

As can be seen, the absolute majority of the benchmarks is improved by this PR.

The second commit (titled "Refactor Internals") has further details on which specific changes were done.

matheusd commented 2 months ago

This is now triggering a new way for TestPromiseOrder to fail... 🙄

matheusd commented 2 months ago

Unfortunately the final races are too deep to easily fix. I'm trying a different approach.

matheusd commented 2 months ago

Rebased and updated to fix the final data races. Also updated the stats against latest main branch.

lthibault commented 2 months ago

Nice! I am aiming to review this afternoon.