godbus / dbus

Native Go bindings for D-Bus
BSD 2-Clause "Simplified" License
976 stars 225 forks source link

Reduce allocs in ReadMessage (unix transport) #347

Closed marselester closed 1 year ago

marselester commented 1 year ago

It appeared to me that certain decoding performed in ReadMessage was repeated in DecodeMessageWithFDs, so I got rid of it. That yielded 58 less allocs, though consumed 5.63KB more per operation according to the benchmark.

# new
BenchmarkUnixFDs-2          3668        395654 ns/op       30512 B/op        849 allocs/op
# old
BenchmarkUnixFDs-2          6574        330534 ns/op       24737 B/op        907 allocs/op

It seems more memory is spent at syscall.anyToSockaddr after refactoring, see the profiles below.

new

``` Showing nodes accounting for 208.52MB, 70.08% of 297.53MB total Dropped 16 nodes (cum <= 1.49MB) Showing top 20 nodes out of 74 flat flat% sum% cum cum% 0.50MB 0.17% 0.17% 234.02MB 78.66% github.com/godbus/dbus/v5.(*Conn).inWorker 7MB 2.35% 2.52% 233.52MB 78.49% github.com/godbus/dbus/v5.(*unixTransport).ReadMessage 4.50MB 1.51% 4.03% 164.01MB 55.13% github.com/godbus/dbus/v5.(*decoder).Decode 20MB 6.72% 10.76% 159.01MB 53.44% github.com/godbus/dbus/v5.(*decoder).decode 0 0% 10.76% 135.01MB 45.38% github.com/godbus/dbus/v5.(*oobReader).Read 0 0% 10.76% 135.01MB 45.38% net.(*UnixConn).ReadMsgUnix 27MB 9.08% 19.83% 135.01MB 45.38% net.(*UnixConn).readMsg 0 0% 19.83% 111.51MB 37.48% io.(*multiReader).Read 0 0% 19.83% 108.01MB 36.30% internal/poll.(*FD).ReadMsg 0 0% 19.83% 108.01MB 36.30% net.(*netFD).readMsg 0 0% 19.83% 108.01MB 36.30% syscall.Recvmsg 108.01MB 36.30% 56.13% 108.01MB 36.30% syscall.anyToSockaddr 20MB 6.72% 62.86% 78.01MB 26.22% github.com/godbus/dbus/v5.(*Message).EncodeToWithFDs 0 0% 62.86% 68.51MB 23.03% io.ReadAtLeast 0 0% 62.86% 68.51MB 23.03% io.ReadFull (inline) 0 0% 62.86% 56MB 18.82% github.com/godbus/dbus/v5.(*Message).EncodeTo (inline) 2.50MB 0.84% 63.70% 49MB 16.47% github.com/godbus/dbus/v5.(*Message).IsValid (inline) 0 0% 63.70% 48MB 16.13% github.com/godbus/dbus/v5.(*encoder).Encode 17.50MB 5.88% 69.58% 48MB 16.13% github.com/godbus/dbus/v5.(*encoder).encode 1.50MB 0.5% 70.08% 46MB 15.46% github.com/godbus/dbus/v5.(*decoder).read2buf ```

old

``` Showing nodes accounting for 72.01MB, 42.73% of 168.51MB total Dropped 11 nodes (cum <= 0.84MB) Showing top 20 nodes out of 75 flat flat% sum% cum cum% 0 0% 0% 120.01MB 71.22% github.com/godbus/dbus/v5.(*Conn).inWorker 3MB 1.78% 1.78% 120.01MB 71.22% github.com/godbus/dbus/v5.(*unixTransport).ReadMessage 4.50MB 2.67% 4.45% 75.01MB 44.51% github.com/godbus/dbus/v5.DecodeMessageWithFDs 5.50MB 3.26% 7.72% 55MB 32.64% github.com/godbus/dbus/v5.(*decoder).Decode 11MB 6.53% 14.24% 52MB 30.86% github.com/godbus/dbus/v5.(*Message).EncodeToWithFDs 23.50MB 13.95% 28.19% 48.50MB 28.78% github.com/godbus/dbus/v5.(*decoder).decode 0 0% 28.19% 43.50MB 25.82% github.com/godbus/dbus/v5.(*Message).EncodeTo (inline) 1MB 0.59% 28.78% 34.50MB 20.47% github.com/godbus/dbus/v5.(*Message).IsValid (inline) 0 0% 28.78% 34.50MB 20.47% github.com/godbus/dbus/v5.(*encoder).Encode 10MB 5.93% 34.72% 34.50MB 20.47% github.com/godbus/dbus/v5.(*encoder).encode 0 0% 34.72% 31MB 18.40% github.com/godbus/dbus/v5.DecodeMessage (inline) 5.50MB 3.26% 37.98% 30MB 17.80% github.com/godbus/dbus/v5.(*Conn).handleCall 0 0% 37.98% 20MB 11.87% github.com/godbus/dbus/v5.(*Conn).sendMessageAndIfClosed 0 0% 37.98% 20MB 11.87% github.com/godbus/dbus/v5.(*outputHandler).sendAndIfClosed 0.50MB 0.3% 38.28% 20MB 11.87% github.com/godbus/dbus/v5.(*unixTransport).SendMessage 0 0% 38.28% 19MB 11.28% bytes.(*Buffer).Write 6MB 3.56% 41.84% 19MB 11.28% bytes.(*Buffer).grow 1.50MB 0.89% 42.73% 17.50MB 10.39% github.com/godbus/dbus/v5.(*Object).Call 0 0% 42.73% 17.50MB 10.39% github.com/godbus/dbus/v5.BenchmarkUnixFDs 0 0% 42.73% 17.50MB 10.39% testing.(*B).launch ```
marselester commented 1 year ago

I figured out why there was an extra memory consumption per operation after refactoring and resolved the problem.

BenchmarkUnixFDs-2          7988        298127 ns/op       16989 B/op        614 allocs/op
marselester commented 1 year ago

Down to 606 allocs from 907, also the benchmark shows 8.17KB less memory usage.

# new
BenchmarkUnixFDs-2          7674        237934 ns/op       16368 B/op        606 allocs/op
# old
BenchmarkUnixFDs-2          6574        330534 ns/op       24737 B/op        907 allocs/op
marselester commented 1 year ago

I will try these changes on some realistic workload to make sure there is no regression.

marselester commented 1 year ago

I have tested the changes on Parca Agent and posted memory profiles here https://github.com/parca-dev/parca-agent/issues/1037#issuecomment-1345331993.