etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.83k stars 9.77k forks source link

Investigate serializable range performance regression #6010

Closed gyuho closed 7 years ago

gyuho commented 8 years ago

Also reported at https://github.com/grpc/grpc-go/issues/771. Before and after grpc dependency update https://github.com/coreos/etcd/pull/5948, there's huge performance drop in serializable range requests:


https://github.com/coreos/etcd/commit/07ed4da2ffe871444c83be63e508d2adf4c87175:

$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1s

Summary:
  Total:    1.0602 secs.
  Slowest:  0.0742 secs.
  Fastest:  0.0003 secs.
  Average:  0.0069 secs.
  Stddev:   0.0061 secs.
  Requests/sec: 94323.9382

Response time histogram:
  0.000 [1] |
  0.008 [65910] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.015 [26302] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.022 [5257]  |∎∎∎
  0.030 [1173]  |
  0.037 [983]   |
  0.045 [220]   |
  0.052 [59]    |
  0.059 [80]    |
  0.067 [12]    |
  0.074 [3] |

Latency distribution:
  10% in 0.0015 secs.
  25% in 0.0028 secs.
  50% in 0.0052 secs.
  75% in 0.0094 secs.
  90% in 0.0137 secs.
  95% in 0.0177 secs.
  99% in 0.0340 secs.


https://github.com/coreos/etcd/commit/d9a8a326dfe4691cf90d2c4d2674e3602de812d8

$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%2s

Summary:
  Total:    2.0064 secs.
  Slowest:  0.2937 secs.
  Fastest:  0.0003 secs.
  Average:  0.0196 secs.
  Stddev:   0.0248 secs.
  Requests/sec: 49841.3578

Response time histogram:
  0.000 [1] |
  0.030 [78888] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.059 [13889] |∎∎∎∎∎∎∎
  0.088 [5035]  |∎∎
  0.118 [1551]  |
  0.147 [448]   |
  0.176 [126]   |
  0.206 [40]    |
  0.235 [16]    |
  0.264 [4] |
  0.294 [2] |

Latency distribution:
  10% in 0.0004 secs.
  25% in 0.0005 secs.
  50% in 0.0174 secs.
  75% in 0.0267 secs.
  90% in 0.0507 secs.
  95% in 0.0687 secs.
  99% in 0.1079 secs.


https://github.com/coreos/etcd/commit/d8f0ef0e800d86e6343b073c406b7fe0e2cb3940

$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range
 100000 / 100000 Boooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%2s

Summary:
  Total:    2.0462 secs.
  Slowest:  0.3614 secs.
  Fastest:  0.0003 secs.
  Average:  0.0200 secs.
  Stddev:   0.0255 secs.
  Requests/sec: 48870.5512

Response time histogram:
  0.000 [1] |
  0.036 [80835] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.073 [14376] |∎∎∎∎∎∎∎
  0.109 [3760]  |∎
  0.145 [759]   |
  0.181 [194]   |
  0.217 [58]    |
  0.253 [13]    |
  0.289 [2] |
  0.325 [1] |
  0.361 [1] |

Latency distribution:
  10% in 0.0004 secs.
  25% in 0.0005 secs.
  50% in 0.0175 secs.
  75% in 0.0275 secs.
  90% in 0.0527 secs.
  95% in 0.0709 secs.
  99% in 0.1094 secs.
gyuho commented 8 years ago

Here's some progress in the investigation. It's very easy to reproduce.




BAD performance

$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range

Summary:
  Total:  2.0949 secs.
  Slowest:  0.2870 secs.
  Fastest:  0.0003 secs.
  Average:  0.0205 secs.
  Stddev: 0.0264 secs.
  Requests/sec: 47734.2400

Response time histogram:
  0.000 [1] |
  0.029 [76792] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.058 [14889] |∎∎∎∎∎∎∎
  0.086 [5286]  |∎∎
  0.115 [1970]  |∎
  0.144 [736] |
  0.172 [227] |
  0.201 [57]  |
  0.230 [25]  |
  0.258 [12]  |
  0.287 [5] |

Latency distribution:
  10% in 0.0004 secs.
  25% in 0.0005 secs.
  50% in 0.0146 secs.
  75% in 0.0279 secs.
  90% in 0.0530 secs.
  95% in 0.0724 secs.
  99% in 0.1167 secs.

Heap profiling data after this run:

$ go tool pprof http://10.240.0.22:2379/debug/pprof/heap

(pprof) top
21237.39kB of 22261.46kB total (95.40%)
Dropped 77 nodes (cum <= 111.31kB)
Showing top 10 nodes out of 46 (cum >= 512.17kB)
      flat  flat%   sum%        cum   cum%
 5046.16kB 22.67% 22.67% 10092.32kB 45.34%  cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
 5046.16kB 22.67% 45.34%  5046.16kB 22.67%  cmd/vendor/github.com/coreos/etcd/rafthttp.startStreamWriter
    3169kB 14.24% 59.57%     3169kB 14.24%  cmd/vendor/google.golang.org/grpc/transport.newFramer
 2368.55kB 10.64% 70.21%  2368.55kB 10.64%  cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamReader).decodeLoop
 2368.55kB 10.64% 80.85%  2368.55kB 10.64%  cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamWriter).run
 1184.27kB  5.32% 86.17%  1184.27kB  5.32%  cmd/vendor/github.com/coreos/etcd/wal.(*WAL).ReadAll
  518.02kB  2.33% 88.50%   518.02kB  2.33%  cmd/vendor/github.com/beorn7/perks/quantile.NewTargeted
  512.31kB  2.30% 90.80%   512.31kB  2.30%  cmd/vendor/golang.org/x/net/trace.New
  512.19kB  2.30% 93.10%   512.19kB  2.30%  runtime.malg
  512.17kB  2.30% 95.40%   512.17kB  2.30%  cmd/vendor/google.golang.org/grpc/transport.(*decodeState).processHeaderField

(pprof) top -cum
4.93MB of 21.74MB total (22.67%)
Dropped 77 nodes (cum <= 0.11MB)
Showing top 10 nodes out of 46 (cum >= 9.86MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%    21.24MB 97.70%  runtime.goexit
         0     0%     0%    12.02MB 55.28%  cmd/vendor/github.com/coreos/etcd/embed.StartEtcd
         0     0%     0%    12.02MB 55.28%  cmd/vendor/github.com/coreos/etcd/etcdmain.Main
         0     0%     0%    12.02MB 55.28%  cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd
         0     0%     0%    12.02MB 55.28%  cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2
         0     0%     0%    12.02MB 55.28%  main.main
         0     0%     0%    12.02MB 55.28%  runtime.main
         0     0%     0%    11.51MB 52.96%  cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer
         0     0%     0%     9.86MB 45.34%  cmd/vendor/github.com/coreos/etcd/rafthttp.(*Transport).AddPeer
    4.93MB 22.67% 22.67%     9.86MB 45.34%  cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer

Profiling data for 30 seconds:

$ go tool pprof http://10.240.0.22:2379/debug/pprof/profile

(pprof) top
15410ms of 39330ms total (39.18%)
Dropped 382 nodes (cum <= 196.65ms)
Showing top 10 nodes out of 193 (cum >= 690ms)
      flat  flat%   sum%        cum   cum%
    6620ms 16.83% 16.83%     6970ms 17.72%  syscall.Syscall
    1400ms  3.56% 20.39%     3080ms  7.83%  runtime.mallocgc
    1230ms  3.13% 23.52%     1230ms  3.13%  runtime.usleep
    1140ms  2.90% 26.42%     3040ms  7.73%  runtime.pcvalue
     920ms  2.34% 28.76%      920ms  2.34%  runtime.heapBitsSetType
     920ms  2.34% 31.10%      920ms  2.34%  runtime.readvarint
     860ms  2.19% 33.28%      860ms  2.19%  runtime.adjustpointers
     860ms  2.19% 35.47%      860ms  2.19%  runtime.epollwait
     770ms  1.96% 37.43%     1690ms  4.30%  runtime.step
     690ms  1.75% 39.18%      690ms  1.75%  runtime.futex

(pprof) top -cum
6.95s of 39.33s total (17.67%)
Dropped 382 nodes (cum <= 0.20s)
Showing top 10 nodes out of 193 (cum >= 6.54s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     27.34s 69.51%  runtime.goexit
     0.03s 0.076% 0.076%     18.70s 47.55%  cmd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
     0.07s  0.18%  0.25%     17.50s 44.50%  cmd/vendor/google.golang.org/grpc.(*Server).handleStream
     0.14s  0.36%  0.61%     17.37s 44.16%  cmd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
         0     0%  0.61%      7.35s 18.69%  cmd/vendor/google.golang.org/grpc.(*Server).handleRawConn
         0     0%  0.61%      7.35s 18.69%  cmd/vendor/google.golang.org/grpc.(*Server).serveNewHTTP2Transport
         0     0%  0.61%      7.31s 18.59%  cmd/vendor/google.golang.org/grpc.(*Server).serveStreams
     0.08s   0.2%  0.81%      7.31s 18.59%  cmd/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams
     6.62s 16.83% 17.65%      6.97s 17.72%  syscall.Syscall
     0.01s 0.025% 17.67%      6.54s 16.63%  cmd/vendor/google.golang.org/grpc.(*Server).sendResponse




GOOD(expected) performance

$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range

Summary:
  Total:    0.9810 secs.
  Slowest:  0.0434 secs.
  Fastest:  0.0003 secs.
  Average:  0.0059 secs.
  Stddev:   0.0042 secs.
  Requests/sec: 101937.7307

Response time histogram:
  0.000 [1] |
  0.005 [49107] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.009 [29103] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.013 [16127] |∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.018 [4085]  |∎∎∎
  0.022 [1105]  |
  0.026 [329]   |
  0.030 [83]    |
  0.035 [43]    |
  0.039 [11]    |
  0.043 [6] |

Heap profiling data after this run:

$ go tool pprof http://10.240.0.22:2379/debug/pprof/heap

(pprof) top
23904.01kB of 23904.01kB total (  100%)
Dropped 71 nodes (cum <= 119.52kB)
Showing top 10 nodes out of 40 (cum >= 11276.60kB)
      flat  flat%   sum%        cum   cum%
 6866.17kB 28.72% 28.72%  6866.17kB 28.72%  cmd/vendor/google.golang.org/grpc/transport.newFramer
 5046.16kB 21.11% 49.83% 10092.32kB 42.22%  cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
 5046.16kB 21.11% 70.94%  5046.16kB 21.11%  cmd/vendor/github.com/coreos/etcd/rafthttp.startStreamWriter
 2368.55kB  9.91% 80.85%  2368.55kB  9.91%  cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamReader).decodeLoop
 2368.55kB  9.91% 90.76%  2368.55kB  9.91%  cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamWriter).run
 1184.27kB  4.95% 95.72%  1184.27kB  4.95%  cmd/vendor/github.com/coreos/etcd/wal.(*WAL).ReadAll
  512.11kB  2.14% 97.86%   512.11kB  2.14%  cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).Range
  512.05kB  2.14%   100%   512.05kB  2.14%  cmd/vendor/golang.org/x/net/context.WithCancel
         0     0%   100% 11276.60kB 47.17%  cmd/vendor/github.com/coreos/etcd/embed.StartEtcd
         0     0%   100% 11276.60kB 47.17%  cmd/vendor/github.com/coreos/etcd/etcdmain.Main

(pprof) top -cum
4.93MB of 23.34MB total (21.11%)
Dropped 71 nodes (cum <= 0.12MB)
Showing top 10 nodes out of 40 (cum >= 9.86MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%    23.34MB   100%  runtime.goexit
         0     0%     0%    11.01MB 47.17%  cmd/vendor/github.com/coreos/etcd/embed.StartEtcd
         0     0%     0%    11.01MB 47.17%  cmd/vendor/github.com/coreos/etcd/etcdmain.Main
         0     0%     0%    11.01MB 47.17%  cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd
         0     0%     0%    11.01MB 47.17%  cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2
         0     0%     0%    11.01MB 47.17%  cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer
         0     0%     0%    11.01MB 47.17%  main.main
         0     0%     0%    11.01MB 47.17%  runtime.main
         0     0%     0%     9.86MB 42.22%  cmd/vendor/github.com/coreos/etcd/rafthttp.(*Transport).AddPeer
    4.93MB 21.11% 21.11%     9.86MB 42.22%  cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer

Profiling data for 30 seconds:

$ go tool pprof http://10.240.0.22:2379/debug/pprof/profile

(pprof) top
12000ms of 31200ms total (38.46%)
Dropped 412 nodes (cum <= 156ms)
Showing top 10 nodes out of 179 (cum >= 830ms)
      flat  flat%   sum%        cum   cum%
    5230ms 16.76% 16.76%     5490ms 17.60%  syscall.Syscall
    1180ms  3.78% 20.54%     2380ms  7.63%  runtime.mallocgc
    1090ms  3.49% 24.04%     1090ms  3.49%  runtime.readvarint
     840ms  2.69% 26.73%     2730ms  8.75%  runtime.pcvalue
     680ms  2.18% 28.91%      680ms  2.18%  runtime.heapBitsSetType
     670ms  2.15% 31.06%      670ms  2.15%  runtime.adjustpointers
     610ms  1.96% 33.01%     1700ms  5.45%  runtime.step
     590ms  1.89% 34.90%      590ms  1.89%  runtime._ExternalCode
     580ms  1.86% 36.76%      580ms  1.86%  runtime.(*mspan).sweep.func1
     530ms  1.70% 38.46%      830ms  2.66%  cmd/vendor/golang.org/x/net/http2/hpack.(*Encoder).searchTable

(pprof) top -cum
5.57s of 31.20s total (17.85%)
Dropped 412 nodes (cum <= 0.16s)
Showing top 10 nodes out of 179 (cum >= 5.07s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     23.58s 75.58%  runtime.goexit
     0.03s 0.096% 0.096%     17.06s 54.68%  cmd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
     0.01s 0.032%  0.13%     15.99s 51.25%  cmd/vendor/google.golang.org/grpc.(*Server).handleStream
     0.16s  0.51%  0.64%     15.98s 51.22%  cmd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
     5.23s 16.76% 17.40%      5.49s 17.60%  syscall.Syscall
     0.04s  0.13% 17.53%      5.23s 16.76%  cmd/vendor/google.golang.org/grpc.(*Server).sendResponse
     0.02s 0.064% 17.60%      5.15s 16.51%  bufio.(*Writer).Flush
     0.02s 0.064% 17.66%      5.13s 16.44%  bufio.(*Writer).flush
     0.04s  0.13% 17.79%      5.11s 16.38%  cmd/vendor/github.com/cockroachdb/cmux.(*MuxConn).Write
     0.02s 0.064% 17.85%      5.07s 16.25%  cmd/vendor/github.com/coreos/etcd/pkg/transport.(*limitListenerConn).Write




Patch

Against master branch below has the 100K range performance.

This is just to roll back https://github.com/grpc/grpc-go/commit/ffdfb592e8ac60f3f9ca4889661eabfe09c11628, but nothing else is changed.

commit 2cd7e439d2c66bfcdbcf376163f603a2bd23589e
Author: Gyu-Ho Lee <gyuhox@gmail.com>
Date:   Fri Jul 22 09:20:40 2016 -0700

    grpc: manual type-change '*rpcError' to 'rpcError'

diff --git a/cmd/vendor/google.golang.org/grpc/call.go b/cmd/vendor/google.golang.org/grpc/call.go
index 84ac178..9326236 100644
--- a/cmd/vendor/google.golang.org/grpc/call.go
+++ b/cmd/vendor/google.golang.org/grpc/call.go
@@ -155,7 +155,7 @@ func Invoke(ctx context.Context, method string, args, reply interface{}, cc *Cli
        t, put, err = cc.getTransport(ctx, gopts)
        if err != nil {
            // TODO(zhaoq): Probably revisit the error handling.
-           if _, ok := err.(*rpcError); ok {
+           if _, ok := err.(rpcError); ok {
                return err
            }
            if err == errConnClosing {
diff --git a/cmd/vendor/google.golang.org/grpc/rpc_util.go b/cmd/vendor/google.golang.org/grpc/rpc_util.go
index d628717..91342bd 100644
--- a/cmd/vendor/google.golang.org/grpc/rpc_util.go
+++ b/cmd/vendor/google.golang.org/grpc/rpc_util.go
@@ -334,7 +334,7 @@ type rpcError struct {
    desc string
 }

-func (e *rpcError) Error() string {
+func (e rpcError) Error() string {
    return fmt.Sprintf("rpc error: code = %d desc = %s", e.code, e.desc)
 }

@@ -344,7 +344,7 @@ func Code(err error) codes.Code {
    if err == nil {
        return codes.OK
    }
-   if e, ok := err.(*rpcError); ok {
+   if e, ok := err.(rpcError); ok {
        return e.code
    }
    return codes.Unknown
@@ -356,7 +356,7 @@ func ErrorDesc(err error) string {
    if err == nil {
        return ""
    }
-   if e, ok := err.(*rpcError); ok {
+   if e, ok := err.(rpcError); ok {
        return e.desc
    }
    return err.Error()
@@ -368,7 +368,7 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
    if c == codes.OK {
        return nil
    }
-   return &rpcError{
+   return rpcError{
        code: c,
        desc: fmt.Sprintf(format, a...),
    }
@@ -377,32 +377,32 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
 // toRPCErr converts an error into a rpcError.
 func toRPCErr(err error) error {
    switch e := err.(type) {
-   case *rpcError:
+   case rpcError:
        return err
    case transport.StreamError:
-       return &rpcError{
+       return rpcError{
            code: e.Code,
            desc: e.Desc,
        }
    case transport.ConnectionError:
-       return &rpcError{
+       return rpcError{
            code: codes.Internal,
            desc: e.Desc,
        }
    default:
        switch err {
        case context.DeadlineExceeded:
-           return &rpcError{
+           return rpcError{
                code: codes.DeadlineExceeded,
                desc: err.Error(),
            }
        case context.Canceled:
-           return &rpcError{
+           return rpcError{
                code: codes.Canceled,
                desc: err.Error(),
            }
        case ErrClientConnClosing:
-           return &rpcError{
+           return rpcError{
                code: codes.FailedPrecondition,
                desc: err.Error(),
            }
diff --git a/cmd/vendor/google.golang.org/grpc/server.go b/cmd/vendor/google.golang.org/grpc/server.go
index a2b2b94..a0300fb 100644
--- a/cmd/vendor/google.golang.org/grpc/server.go
+++ b/cmd/vendor/google.golang.org/grpc/server.go
@@ -579,7 +579,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
        }
        reply, appErr := md.Handler(srv.server, stream.Context(), df, s.opts.unaryInt)
        if appErr != nil {
-           if err, ok := appErr.(*rpcError); ok {
+           if err, ok := appErr.(rpcError); ok {
                statusCode = err.code
                statusDesc = err.desc
            } else {
@@ -664,7 +664,7 @@ func (s *Server) processStreamingRPC(t transport.ServerTransport, stream *transp
        appErr = s.opts.streamInt(srv.server, ss, info, sd.Handler)
    }
    if appErr != nil {
-       if err, ok := appErr.(*rpcError); ok {
+       if err, ok := appErr.(rpcError); ok {
            ss.statusCode = err.code
            ss.statusDesc = err.desc
        } else if err, ok := appErr.(transport.StreamError); ok {
diff --git a/cmd/vendor/google.golang.org/grpc/stream.go b/cmd/vendor/google.golang.org/grpc/stream.go
index 7a3bef5..73d1da2 100644
--- a/cmd/vendor/google.golang.org/grpc/stream.go
+++ b/cmd/vendor/google.golang.org/grpc/stream.go
@@ -149,7 +149,7 @@ func NewClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth
        t, put, err = cc.getTransport(ctx, gopts)
        if err != nil {
            // TODO(zhaoq): Probably revisit the error handling.
-           if _, ok := err.(*rpcError); ok {
+           if _, ok := err.(rpcError); ok {
                return nil, err
            }
            if err == errConnClosing {
xiang90 commented 8 years ago

@gyuho We probably need to figure out the caller of newFramer function.

gyuho commented 8 years ago

Actually there's not much difference between handleStreams calls, when benchmark runs with --total=1000000. Both do about the same calls to handleStreams.

bad


good


Did run grpc-benchmarks but both have about the same performance...


Updates 2016.07.25 : 01

More test data...

master-cpu.pb.gz master-cpu.pdf

master-heap.pb.gz master-heap.pdf

test-cpu.pb.gz test-cpu.pdf

test-heap.pb.gz test-heap.pdf

Server side profiling data doesn't explain much...


Updates 2016.07.25 : 02

Reproducible with single node cluster:

master branch: 14247.8641
test branch:   28999.8329


Updates 2016.07.25 : 03

Even single node doesn't seem to show much difference in profiling...

master-cpu.pb.gz master-cpu.pdf

master-heap.pb.gz master-heap.pdf

test-cpu.pb.gz test-cpu.pdf

test-heap.pb.gz test-heap.pdf


Updates 2016.07.25 : 04

It's likely regression from Go, plus gRPC change.

master branch, 1 node cluster
go 1.6    : 15339.9802
go 1.6.3  : 14402.0303
go 1.7    : 15605.9624
go master : 16836.9583

test branch, 1 node cluster
go 1.6    : 31372.9589
go 1.6.3  : 27012.3428
go 1.7    : 17090.2322
go master : 16823.8769

master branch, 3 node cluster
go 1.6    : 48845.3885
go 1.6.3  : 49220.5884
go 1.7    : 53688.9684
go master : 53887.7715

test branch, 3 node cluster
go 1.6    : 92374.1203
go 1.6.3  : 86653.9443
go 1.7    : 47207.7163
go master : 52174.3944


With Anthony's patch, this seems resolved, but we still see slight regression with *rpcError

go1.6.3 anthony's patch
92371.0151

go1.6.3 anthony's patch + grpc error patch to rpcError (non pointer)
97333.2936

go1.7rc3 anthony's patch
92872.4881

go1.7rc3 anthony's patch + grpc error patch to rpcError (non pointer)
96982.5091
gyuho commented 7 years ago

Patch

commit bb9ab45b754d491d4ccbbe282e29d5755a69502e
Author: Gyu-Ho Lee <gyuhox@gmail.com>
Date:   Thu Nov 17 00:52:27 2016 -0800

    vendor: revert error type change in gRPC

diff --git a/cmd/vendor/google.golang.org/grpc/call.go b/cmd/vendor/google.golang.org/grpc/call.go
index 772c817..789aafc 100644
--- a/cmd/vendor/google.golang.org/grpc/call.go
+++ b/cmd/vendor/google.golang.org/grpc/call.go
@@ -173,7 +173,7 @@ func invoke(ctx context.Context, method string, args, reply interface{}, cc *Cli
        t, put, err = cc.getTransport(ctx, gopts)
        if err != nil {
            // TODO(zhaoq): Probably revisit the error handling.
-           if _, ok := err.(*rpcError); ok {
+           if _, ok := err.(rpcError); ok {
                return err
            }
            if err == errConnClosing || err == errConnUnavailable {
diff --git a/cmd/vendor/google.golang.org/grpc/rpc_util.go b/cmd/vendor/google.golang.org/grpc/rpc_util.go
index a25eaa8..a35e546 100644
--- a/cmd/vendor/google.golang.org/grpc/rpc_util.go
+++ b/cmd/vendor/google.golang.org/grpc/rpc_util.go
@@ -342,7 +342,7 @@ type rpcError struct {
    desc string
 }

-func (e *rpcError) Error() string {
+func (e rpcError) Error() string {
    return fmt.Sprintf("rpc error: code = %d desc = %s", e.code, e.desc)
 }

@@ -352,7 +352,7 @@ func Code(err error) codes.Code {
    if err == nil {
        return codes.OK
    }
-   if e, ok := err.(*rpcError); ok {
+   if e, ok := err.(rpcError); ok {
        return e.code
    }
    return codes.Unknown
@@ -364,7 +364,7 @@ func ErrorDesc(err error) string {
    if err == nil {
        return ""
    }
-   if e, ok := err.(*rpcError); ok {
+   if e, ok := err.(rpcError); ok {
        return e.desc
    }
    return err.Error()
@@ -376,7 +376,7 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
    if c == codes.OK {
        return nil
    }
-   return &rpcError{
+   return rpcError{
        code: c,
        desc: fmt.Sprintf(format, a...),
    }
@@ -385,32 +385,32 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
 // toRPCErr converts an error into a rpcError.
 func toRPCErr(err error) error {
    switch e := err.(type) {
-   case *rpcError:
+   case rpcError:
        return err
    case transport.StreamError:
-       return &rpcError{
+       return rpcError{
            code: e.Code,
            desc: e.Desc,
        }
    case transport.ConnectionError:
-       return &rpcError{
+       return rpcError{
            code: codes.Internal,
            desc: e.Desc,
        }
    default:
        switch err {
        case context.DeadlineExceeded:
-           return &rpcError{
+           return rpcError{
                code: codes.DeadlineExceeded,
                desc: err.Error(),
            }
        case context.Canceled:
-           return &rpcError{
+           return rpcError{
                code: codes.Canceled,
                desc: err.Error(),
            }
        case ErrClientConnClosing:
-           return &rpcError{
+           return rpcError{
                code: codes.FailedPrecondition,
                desc: err.Error(),
            }
diff --git a/cmd/vendor/google.golang.org/grpc/server.go b/cmd/vendor/google.golang.org/grpc/server.go
index e0bb187..9eabe90 100644
--- a/cmd/vendor/google.golang.org/grpc/server.go
+++ b/cmd/vendor/google.golang.org/grpc/server.go
@@ -578,7 +578,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
        }
        if err != nil {
            switch err := err.(type) {
-           case *rpcError:
+           case rpcError:
                if err := t.WriteStatus(stream, err.code, err.desc); err != nil {
                    grpclog.Printf("grpc: Server.processUnaryRPC failed to write status %v", err)
                }
@@ -596,7 +596,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.

        if err := checkRecvPayload(pf, stream.RecvCompress(), s.opts.dc); err != nil {
            switch err := err.(type) {
-           case *rpcError:
+           case rpcError:
                if err := t.WriteStatus(stream, err.code, err.desc); err != nil {
                    grpclog.Printf("grpc: Server.processUnaryRPC failed to write status %v", err)
                }
@@ -637,7 +637,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
        }
        reply, appErr := md.Handler(srv.server, stream.Context(), df, s.opts.unaryInt)
        if appErr != nil {
-           if err, ok := appErr.(*rpcError); ok {
+           if err, ok := appErr.(rpcError); ok {
                statusCode = err.code
                statusDesc = err.desc
            } else {
@@ -723,7 +723,7 @@ func (s *Server) processStreamingRPC(t transport.ServerTransport, stream *transp
        appErr = s.opts.streamInt(srv.server, ss, info, sd.Handler)
    }
    if appErr != nil {
-       if err, ok := appErr.(*rpcError); ok {
+       if err, ok := appErr.(rpcError); ok {
            ss.statusCode = err.code
            ss.statusDesc = err.desc
        } else if err, ok := appErr.(transport.StreamError); ok {
diff --git a/cmd/vendor/google.golang.org/grpc/stream.go b/cmd/vendor/google.golang.org/grpc/stream.go
index 4681054..22c7895 100644
--- a/cmd/vendor/google.golang.org/grpc/stream.go
+++ b/cmd/vendor/google.golang.org/grpc/stream.go
@@ -150,7 +150,7 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth
        t, put, err = cc.getTransport(ctx, gopts)
        if err != nil {
            // TODO(zhaoq): Probably revisit the error handling.
-           if _, ok := err.(*rpcError); ok {
+           if _, ok := err.(rpcError); ok {
                return nil, err
            }
            if err == errConnClosing || err == errConnUnavailable {

From v3.0.0, v3.0.1, v3.0.2 to v3.0.3, we get 2x regression in S-reads.

# compiled with Go 1.6.3
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 endpoint health
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 put foo bar
benchmark --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 --conns=100 --clients=1000 range foo --consistency=s --total=100000

v3.0.2 with Go 1.6.3

Summary:
  Total:    0.6376 secs.
  Slowest:  0.0701 secs.
  Fastest:  0.0002 secs.
  Average:  0.0051 secs.
  Stddev:   0.0055 secs.
  Requests/sec: 156829.9965

v3.0.3 with Go 1.6.3

Summary:
  Total:    1.7543 secs.
  Slowest:  0.3146 secs.
  Fastest:  0.0002 secs.
  Average:  0.0172 secs.
  Stddev:   0.0222 secs.
  Requests/sec: 57002.8634

Master branch with Go tip

Summary:
  Total:  1.1500 secs.
  Slowest:  0.1776 secs.
  Fastest:  0.0002 secs.
  Average:  0.0111 secs.
  Stddev: 0.0140 secs.
  Requests/sec: 86957.1846

https://github.com/grpc/grpc-go/issues/771 https://github.com/grpc/grpc-go/commit/ffdfb592e8ac60f3f9ca4889661eabfe09c11628 https://github.com/coreos/etcd/commits/master?after=N38ZsAMfnAqv4q7Ci2%2BQGTEfUvkrMTExOQ%3D%3D

To summarize serializable performance

Go 1.6.3 + current master branch: BAD (76104.0368 qps)
Go 1.7.3 + current master branch: BAD (60 ~ 70K qps)
Go tip   + current master branch: BAD (86957.1846 qps)

Go 1.6.3 + current master branch + patch:  OK (119141.2140 qps -- still slower than 150K of v3.0.2)
Go 1.7.3 + current master branch + patch: BAD (60 ~ 70K qps)
Go tip   + current master branch + patch: BAD (87482.1327 qps)

So it's Go + gRPC that introduced the regression.

/cc @heyitsanthony @xiang90

gyuho commented 7 years ago

Not much progress so far...

Just to add a note

diff --git a/mvcc/kvstore.go b/mvcc/kvstore.go
index ff038bc..fb80808 100644
--- a/mvcc/kvstore.go
+++ b/mvcc/kvstore.go
@@ -17,6 +17,7 @@ package mvcc
 import (
        "encoding/binary"
        "errors"
+       "fmt"
        "math"
        "math/rand"
        "sync"
@@ -154,7 +155,6 @@ func (s *store) Range(key, end []byte, ro RangeOptions) (r *RangeResult, err err
        id := s.TxnBegin()
        kvs, count, rev, err := s.rangeKeys(key, end, ro.Limit, ro.Rev, ro.Count)
        s.txnEnd(id)
-
        rangeCounter.Inc()

        r = &RangeResult{
@@ -177,10 +177,12 @@ func (s *store) DeleteRange(key, end []byte) (n, rev int64) {
 }

 func (s *store) TxnBegin() int64 {
+       now := time.Now()
        s.mu.Lock()
        s.currentRev.sub = 0
        s.tx = s.b.BatchTx()
        s.tx.Lock()
+       fmt.Println("[DEBUG] BatchTx took", time.Since(now))

Same code with etcd master branch shows latency spikes around *store.Mutex.Lock. Go 1.6.3 takes only 200 ~ 300ns, while Go 1.7.3 takes 50 ~ 150ms. Initial requests in both case take 200 ~ 300ns, but later requests in Go 1.7.3 slow down. Seems like some request congestion before this transaction code path is triggering lock contentions in later requests. This only happens with more than 10,000 read requests with 1,000 concurrent clients (100 TCP connections).


Update

Lock contention is expected...

go test -v -run TestConcurrentRange

func TestConcurrentRange(t *testing.T) {
    var i fakeConsistentIndex
    be, tmpPath := backend.NewDefaultTmpBackend()
    s := NewStore(be, &lease.FakeLessor{}, &i)
    defer cleanup(s, be, tmpPath)

    s.Put([]byte("foo"), []byte("bar"), lease.NoLease)

    var wg sync.WaitGroup
    wg.Add(1000)
    for i := 0; i < 1000; i++ {
        go func() {
            defer wg.Done()
            s.Range([]byte("foo"), nil, RangeOptions{Rev: 1})
        }()
    }
    wg.Wait()
}

Go 1.6.3 and 1.7.3 have same latency spikes in this test.

xiang90 commented 7 years ago

@gyuho After taking a closer look at the 1.7.3 pprof data, I think we did something wrong.

For example, it shows

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*kVClient).Txn
0.03s(0.045%)
of 9.55s(14.32%)

Why do we have txn request in the data???

Also it shows

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.FiltersFromRequest
0.04s(0.06%)
of 9.25s(13.87%)

This func can only be called in watch.

So the pprof result for 1.7.3 is clearly problematic.

gyuho commented 7 years ago

@xiang90 All I did was

ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 put foo bar
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 get foo --consistency=s

benchmark --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 --conns=100 --clients=1000 range foo --consistency=s --total=10000

I could have done wrong. I will rerun these tests and make sure those calls do not show up.

xiang90 commented 7 years ago

@gyuho Well. Probably we did something wrong in our client code (FiltersFromRequest is called from watch), if you have time probably you can switch to raw gRPC to do benchmark. it might make a difference?

xiang90 commented 7 years ago

Or you might point pprof to the wrong binary: 1.7.3 pprof data but points to 1.6.3 binary. The pprof of 1.7.3 seems really broken.

gyuho commented 7 years ago

Yeah I should try raw grpc calls. And all clients were in Go 1.7.3, just different Go versions in server side. I should try clients in different Go versions as well.

xiang90 commented 7 years ago
v3.0.2 with Go 1.6.3

Summary:
  Total:    0.6376 secs.
  Slowest:  0.0701 secs.
  Fastest:  0.0002 secs.
  Average:  0.0051 secs.
  Stddev:   0.0055 secs.
  Requests/sec: 156829.9965

Can you try to build 3.0.2 with go 1.7 and do a benchmark?

gyuho commented 7 years ago

@xiang90

v3.0.2 with Go 1.7.3 (client is also Go 1.7.3 with the same git sha)

Summary:
  Total:    0.1427 secs.
  Slowest:  0.1071 secs.
  Fastest:  0.0002 secs.
  Average:  0.0120 secs.
  Stddev:   0.0133 secs.
  Requests/sec: 70080.2853
xiang90 commented 7 years ago

Probably a good start is to focus only on 3.0.2 and see what is the difference between go 16 and 17 since that is the only change.

xiang90 commented 7 years ago

@gyuho can you get a pprof data for 3.0.2 with go16 and go17? (try --total=500000 for a longer period of data)

gyuho commented 7 years ago

etcdv3.0.2-go1.6.2-5M-serializable-reads.pdf pprof.10.240.0.44:2379.samples.cpu.001.pb.gz

# etcd v3.0.2 with Go 1.6.3
# benchmark compiled with Go 1.7.3 in master branch

benchmark --endpoints=10.240.0.44:2379 --conns=100 --clients=1000 range foo --consistency=s --total=5000000

bench with serializable range
 5000000 / 5000000 Booooooooooooooooooooooooooooooo! 100.00%1m57s

Summary:
  Total:  117.6307 secs.
  Slowest:  0.5534 secs.
  Fastest:  0.0002 secs.
  Average:  0.0235 secs.
  Stddev: 0.0336 secs.
  Requests/sec: 42505.9236

Response time histogram:
  0.0002 [1]  |
  0.0555 [4153659]  |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.1108 [713134] |∎∎∎∎∎∎
  0.1661 [110394] |∎
  0.2215 [17593]  |
  0.2768 [4068] |
  0.3321 [975]  |
  0.3874 [140]  |
  0.4427 [27] |
  0.4981 [6]  |
  0.5534 [3]  |

Latency distribution:
  10% in 0.0004 secs.
  25% in 0.0005 secs.
  50% in 0.0010 secs.
  75% in 0.0355 secs.
  90% in 0.0652 secs.
  95% in 0.0936 secs.
  99% in 0.1504 secs.

etcdv3.0.2-go1.7.3-2M-serializable-reads.pdf pprof.etcd.10.240.0.44:2379.samples.cpu.001.pb.gz

# etcd v3.0.2 with Go 1.7.3
# benchmark compiled with Go 1.7.3 in master branch

benchmark --endpoints=10.240.0.44:2379 --conns=100 --clients=1000 range foo --consistency=s --total=2000000
bench with serializable range
 2000000 / 2000000 Booooooooooooooooooooooooooooooo! 100.00%1m40s

Summary:
  Total:  100.3941 secs.
  Slowest:  1.0669 secs.
  Fastest:  0.0002 secs.
  Average:  0.0502 secs.
  Stddev: 0.0668 secs.
  Requests/sec: 19921.4873

Response time histogram:
  0.0002 [1]  |
  0.1069 [1604052]  |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.2135 [332074] |∎∎∎∎∎∎∎∎
  0.3202 [53733]  |∎
  0.4269 [8303] |
  0.5335 [1254] |
  0.6402 [480]  |
  0.7469 [90] |
  0.8535 [11] |
  0.9602 [1]  |
  1.0669 [1]  |

Latency distribution:
  10% in 0.0003 secs.
  25% in 0.0003 secs.
  50% in 0.0432 secs.
  75% in 0.0635 secs.
  90% in 0.1227 secs.
  95% in 0.1805 secs.
  99% in 0.2977 secs.
xiang90 commented 7 years ago

@gyuho

Hmm... Is there a reason that we do 5M reads for go 16 but 2M for go 17? We really should make sure all the variables are the same except go versions.

If you take a look at the go 17 pprof data, there is no range related funcs there. The most significant one is lease operation. This is strange.

xiang90 commented 7 years ago

@gyuho I did a pprof on go 1.7.2 + 3.0.2 myself. The benchmark result is similar. But the pprof result is clean and is different from yours. I think you did something wrong for pprof.

Here is the correct way to do it:

  1. start etcd with pprof enabled
  2. benchmark range at least for 30s (pprof needs 30s to collect data)
  3. go tool pprof $ENDPOINT
  4. go tool -svg binary pprof.result > cpu.svg

You need to make sure you provide the current binary. If you provide wrong binary, the result it meaningless.

xiang90 commented 7 years ago

@gyuho Can you create a new github issue for this problem? This issue is too messy and contains wrong pprof information. Let's start from fresh?