Closed gyuho closed 7 years ago
Here's some progress in the investigation. It's very easy to reproduce.
$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range
Summary:
Total: 2.0949 secs.
Slowest: 0.2870 secs.
Fastest: 0.0003 secs.
Average: 0.0205 secs.
Stddev: 0.0264 secs.
Requests/sec: 47734.2400
Response time histogram:
0.000 [1] |
0.029 [76792] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.058 [14889] |∎∎∎∎∎∎∎
0.086 [5286] |∎∎
0.115 [1970] |∎
0.144 [736] |
0.172 [227] |
0.201 [57] |
0.230 [25] |
0.258 [12] |
0.287 [5] |
Latency distribution:
10% in 0.0004 secs.
25% in 0.0005 secs.
50% in 0.0146 secs.
75% in 0.0279 secs.
90% in 0.0530 secs.
95% in 0.0724 secs.
99% in 0.1167 secs.
Heap profiling data after this run:
$ go tool pprof http://10.240.0.22:2379/debug/pprof/heap
(pprof) top
21237.39kB of 22261.46kB total (95.40%)
Dropped 77 nodes (cum <= 111.31kB)
Showing top 10 nodes out of 46 (cum >= 512.17kB)
flat flat% sum% cum cum%
5046.16kB 22.67% 22.67% 10092.32kB 45.34% cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
5046.16kB 22.67% 45.34% 5046.16kB 22.67% cmd/vendor/github.com/coreos/etcd/rafthttp.startStreamWriter
3169kB 14.24% 59.57% 3169kB 14.24% cmd/vendor/google.golang.org/grpc/transport.newFramer
2368.55kB 10.64% 70.21% 2368.55kB 10.64% cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamReader).decodeLoop
2368.55kB 10.64% 80.85% 2368.55kB 10.64% cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamWriter).run
1184.27kB 5.32% 86.17% 1184.27kB 5.32% cmd/vendor/github.com/coreos/etcd/wal.(*WAL).ReadAll
518.02kB 2.33% 88.50% 518.02kB 2.33% cmd/vendor/github.com/beorn7/perks/quantile.NewTargeted
512.31kB 2.30% 90.80% 512.31kB 2.30% cmd/vendor/golang.org/x/net/trace.New
512.19kB 2.30% 93.10% 512.19kB 2.30% runtime.malg
512.17kB 2.30% 95.40% 512.17kB 2.30% cmd/vendor/google.golang.org/grpc/transport.(*decodeState).processHeaderField
(pprof) top -cum
4.93MB of 21.74MB total (22.67%)
Dropped 77 nodes (cum <= 0.11MB)
Showing top 10 nodes out of 46 (cum >= 9.86MB)
flat flat% sum% cum cum%
0 0% 0% 21.24MB 97.70% runtime.goexit
0 0% 0% 12.02MB 55.28% cmd/vendor/github.com/coreos/etcd/embed.StartEtcd
0 0% 0% 12.02MB 55.28% cmd/vendor/github.com/coreos/etcd/etcdmain.Main
0 0% 0% 12.02MB 55.28% cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd
0 0% 0% 12.02MB 55.28% cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2
0 0% 0% 12.02MB 55.28% main.main
0 0% 0% 12.02MB 55.28% runtime.main
0 0% 0% 11.51MB 52.96% cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer
0 0% 0% 9.86MB 45.34% cmd/vendor/github.com/coreos/etcd/rafthttp.(*Transport).AddPeer
4.93MB 22.67% 22.67% 9.86MB 45.34% cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
Profiling data for 30 seconds:
$ go tool pprof http://10.240.0.22:2379/debug/pprof/profile
(pprof) top
15410ms of 39330ms total (39.18%)
Dropped 382 nodes (cum <= 196.65ms)
Showing top 10 nodes out of 193 (cum >= 690ms)
flat flat% sum% cum cum%
6620ms 16.83% 16.83% 6970ms 17.72% syscall.Syscall
1400ms 3.56% 20.39% 3080ms 7.83% runtime.mallocgc
1230ms 3.13% 23.52% 1230ms 3.13% runtime.usleep
1140ms 2.90% 26.42% 3040ms 7.73% runtime.pcvalue
920ms 2.34% 28.76% 920ms 2.34% runtime.heapBitsSetType
920ms 2.34% 31.10% 920ms 2.34% runtime.readvarint
860ms 2.19% 33.28% 860ms 2.19% runtime.adjustpointers
860ms 2.19% 35.47% 860ms 2.19% runtime.epollwait
770ms 1.96% 37.43% 1690ms 4.30% runtime.step
690ms 1.75% 39.18% 690ms 1.75% runtime.futex
(pprof) top -cum
6.95s of 39.33s total (17.67%)
Dropped 382 nodes (cum <= 0.20s)
Showing top 10 nodes out of 193 (cum >= 6.54s)
flat flat% sum% cum cum%
0 0% 0% 27.34s 69.51% runtime.goexit
0.03s 0.076% 0.076% 18.70s 47.55% cmd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
0.07s 0.18% 0.25% 17.50s 44.50% cmd/vendor/google.golang.org/grpc.(*Server).handleStream
0.14s 0.36% 0.61% 17.37s 44.16% cmd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
0 0% 0.61% 7.35s 18.69% cmd/vendor/google.golang.org/grpc.(*Server).handleRawConn
0 0% 0.61% 7.35s 18.69% cmd/vendor/google.golang.org/grpc.(*Server).serveNewHTTP2Transport
0 0% 0.61% 7.31s 18.59% cmd/vendor/google.golang.org/grpc.(*Server).serveStreams
0.08s 0.2% 0.81% 7.31s 18.59% cmd/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams
6.62s 16.83% 17.65% 6.97s 17.72% syscall.Syscall
0.01s 0.025% 17.67% 6.54s 16.63% cmd/vendor/google.golang.org/grpc.(*Server).sendResponse
$ benchmark --endpoints=${IP_1},${IP_2},${IP_3} --conns=100 --clients=1000 range aaaaaaaa --consistency=s --total=100000
bench with serializable range
Summary:
Total: 0.9810 secs.
Slowest: 0.0434 secs.
Fastest: 0.0003 secs.
Average: 0.0059 secs.
Stddev: 0.0042 secs.
Requests/sec: 101937.7307
Response time histogram:
0.000 [1] |
0.005 [49107] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.009 [29103] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.013 [16127] |∎∎∎∎∎∎∎∎∎∎∎∎∎
0.018 [4085] |∎∎∎
0.022 [1105] |
0.026 [329] |
0.030 [83] |
0.035 [43] |
0.039 [11] |
0.043 [6] |
Heap profiling data after this run:
$ go tool pprof http://10.240.0.22:2379/debug/pprof/heap
(pprof) top
23904.01kB of 23904.01kB total ( 100%)
Dropped 71 nodes (cum <= 119.52kB)
Showing top 10 nodes out of 40 (cum >= 11276.60kB)
flat flat% sum% cum cum%
6866.17kB 28.72% 28.72% 6866.17kB 28.72% cmd/vendor/google.golang.org/grpc/transport.newFramer
5046.16kB 21.11% 49.83% 10092.32kB 42.22% cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
5046.16kB 21.11% 70.94% 5046.16kB 21.11% cmd/vendor/github.com/coreos/etcd/rafthttp.startStreamWriter
2368.55kB 9.91% 80.85% 2368.55kB 9.91% cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamReader).decodeLoop
2368.55kB 9.91% 90.76% 2368.55kB 9.91% cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamWriter).run
1184.27kB 4.95% 95.72% 1184.27kB 4.95% cmd/vendor/github.com/coreos/etcd/wal.(*WAL).ReadAll
512.11kB 2.14% 97.86% 512.11kB 2.14% cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).Range
512.05kB 2.14% 100% 512.05kB 2.14% cmd/vendor/golang.org/x/net/context.WithCancel
0 0% 100% 11276.60kB 47.17% cmd/vendor/github.com/coreos/etcd/embed.StartEtcd
0 0% 100% 11276.60kB 47.17% cmd/vendor/github.com/coreos/etcd/etcdmain.Main
(pprof) top -cum
4.93MB of 23.34MB total (21.11%)
Dropped 71 nodes (cum <= 0.12MB)
Showing top 10 nodes out of 40 (cum >= 9.86MB)
flat flat% sum% cum cum%
0 0% 0% 23.34MB 100% runtime.goexit
0 0% 0% 11.01MB 47.17% cmd/vendor/github.com/coreos/etcd/embed.StartEtcd
0 0% 0% 11.01MB 47.17% cmd/vendor/github.com/coreos/etcd/etcdmain.Main
0 0% 0% 11.01MB 47.17% cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd
0 0% 0% 11.01MB 47.17% cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2
0 0% 0% 11.01MB 47.17% cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer
0 0% 0% 11.01MB 47.17% main.main
0 0% 0% 11.01MB 47.17% runtime.main
0 0% 0% 9.86MB 42.22% cmd/vendor/github.com/coreos/etcd/rafthttp.(*Transport).AddPeer
4.93MB 21.11% 21.11% 9.86MB 42.22% cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
Profiling data for 30 seconds:
$ go tool pprof http://10.240.0.22:2379/debug/pprof/profile
(pprof) top
12000ms of 31200ms total (38.46%)
Dropped 412 nodes (cum <= 156ms)
Showing top 10 nodes out of 179 (cum >= 830ms)
flat flat% sum% cum cum%
5230ms 16.76% 16.76% 5490ms 17.60% syscall.Syscall
1180ms 3.78% 20.54% 2380ms 7.63% runtime.mallocgc
1090ms 3.49% 24.04% 1090ms 3.49% runtime.readvarint
840ms 2.69% 26.73% 2730ms 8.75% runtime.pcvalue
680ms 2.18% 28.91% 680ms 2.18% runtime.heapBitsSetType
670ms 2.15% 31.06% 670ms 2.15% runtime.adjustpointers
610ms 1.96% 33.01% 1700ms 5.45% runtime.step
590ms 1.89% 34.90% 590ms 1.89% runtime._ExternalCode
580ms 1.86% 36.76% 580ms 1.86% runtime.(*mspan).sweep.func1
530ms 1.70% 38.46% 830ms 2.66% cmd/vendor/golang.org/x/net/http2/hpack.(*Encoder).searchTable
(pprof) top -cum
5.57s of 31.20s total (17.85%)
Dropped 412 nodes (cum <= 0.16s)
Showing top 10 nodes out of 179 (cum >= 5.07s)
flat flat% sum% cum cum%
0 0% 0% 23.58s 75.58% runtime.goexit
0.03s 0.096% 0.096% 17.06s 54.68% cmd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
0.01s 0.032% 0.13% 15.99s 51.25% cmd/vendor/google.golang.org/grpc.(*Server).handleStream
0.16s 0.51% 0.64% 15.98s 51.22% cmd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
5.23s 16.76% 17.40% 5.49s 17.60% syscall.Syscall
0.04s 0.13% 17.53% 5.23s 16.76% cmd/vendor/google.golang.org/grpc.(*Server).sendResponse
0.02s 0.064% 17.60% 5.15s 16.51% bufio.(*Writer).Flush
0.02s 0.064% 17.66% 5.13s 16.44% bufio.(*Writer).flush
0.04s 0.13% 17.79% 5.11s 16.38% cmd/vendor/github.com/cockroachdb/cmux.(*MuxConn).Write
0.02s 0.064% 17.85% 5.07s 16.25% cmd/vendor/github.com/coreos/etcd/pkg/transport.(*limitListenerConn).Write
Against master branch below has the 100K range performance.
This is just to roll back https://github.com/grpc/grpc-go/commit/ffdfb592e8ac60f3f9ca4889661eabfe09c11628, but nothing else is changed.
commit 2cd7e439d2c66bfcdbcf376163f603a2bd23589e
Author: Gyu-Ho Lee <gyuhox@gmail.com>
Date: Fri Jul 22 09:20:40 2016 -0700
grpc: manual type-change '*rpcError' to 'rpcError'
diff --git a/cmd/vendor/google.golang.org/grpc/call.go b/cmd/vendor/google.golang.org/grpc/call.go
index 84ac178..9326236 100644
--- a/cmd/vendor/google.golang.org/grpc/call.go
+++ b/cmd/vendor/google.golang.org/grpc/call.go
@@ -155,7 +155,7 @@ func Invoke(ctx context.Context, method string, args, reply interface{}, cc *Cli
t, put, err = cc.getTransport(ctx, gopts)
if err != nil {
// TODO(zhaoq): Probably revisit the error handling.
- if _, ok := err.(*rpcError); ok {
+ if _, ok := err.(rpcError); ok {
return err
}
if err == errConnClosing {
diff --git a/cmd/vendor/google.golang.org/grpc/rpc_util.go b/cmd/vendor/google.golang.org/grpc/rpc_util.go
index d628717..91342bd 100644
--- a/cmd/vendor/google.golang.org/grpc/rpc_util.go
+++ b/cmd/vendor/google.golang.org/grpc/rpc_util.go
@@ -334,7 +334,7 @@ type rpcError struct {
desc string
}
-func (e *rpcError) Error() string {
+func (e rpcError) Error() string {
return fmt.Sprintf("rpc error: code = %d desc = %s", e.code, e.desc)
}
@@ -344,7 +344,7 @@ func Code(err error) codes.Code {
if err == nil {
return codes.OK
}
- if e, ok := err.(*rpcError); ok {
+ if e, ok := err.(rpcError); ok {
return e.code
}
return codes.Unknown
@@ -356,7 +356,7 @@ func ErrorDesc(err error) string {
if err == nil {
return ""
}
- if e, ok := err.(*rpcError); ok {
+ if e, ok := err.(rpcError); ok {
return e.desc
}
return err.Error()
@@ -368,7 +368,7 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
if c == codes.OK {
return nil
}
- return &rpcError{
+ return rpcError{
code: c,
desc: fmt.Sprintf(format, a...),
}
@@ -377,32 +377,32 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
// toRPCErr converts an error into a rpcError.
func toRPCErr(err error) error {
switch e := err.(type) {
- case *rpcError:
+ case rpcError:
return err
case transport.StreamError:
- return &rpcError{
+ return rpcError{
code: e.Code,
desc: e.Desc,
}
case transport.ConnectionError:
- return &rpcError{
+ return rpcError{
code: codes.Internal,
desc: e.Desc,
}
default:
switch err {
case context.DeadlineExceeded:
- return &rpcError{
+ return rpcError{
code: codes.DeadlineExceeded,
desc: err.Error(),
}
case context.Canceled:
- return &rpcError{
+ return rpcError{
code: codes.Canceled,
desc: err.Error(),
}
case ErrClientConnClosing:
- return &rpcError{
+ return rpcError{
code: codes.FailedPrecondition,
desc: err.Error(),
}
diff --git a/cmd/vendor/google.golang.org/grpc/server.go b/cmd/vendor/google.golang.org/grpc/server.go
index a2b2b94..a0300fb 100644
--- a/cmd/vendor/google.golang.org/grpc/server.go
+++ b/cmd/vendor/google.golang.org/grpc/server.go
@@ -579,7 +579,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
}
reply, appErr := md.Handler(srv.server, stream.Context(), df, s.opts.unaryInt)
if appErr != nil {
- if err, ok := appErr.(*rpcError); ok {
+ if err, ok := appErr.(rpcError); ok {
statusCode = err.code
statusDesc = err.desc
} else {
@@ -664,7 +664,7 @@ func (s *Server) processStreamingRPC(t transport.ServerTransport, stream *transp
appErr = s.opts.streamInt(srv.server, ss, info, sd.Handler)
}
if appErr != nil {
- if err, ok := appErr.(*rpcError); ok {
+ if err, ok := appErr.(rpcError); ok {
ss.statusCode = err.code
ss.statusDesc = err.desc
} else if err, ok := appErr.(transport.StreamError); ok {
diff --git a/cmd/vendor/google.golang.org/grpc/stream.go b/cmd/vendor/google.golang.org/grpc/stream.go
index 7a3bef5..73d1da2 100644
--- a/cmd/vendor/google.golang.org/grpc/stream.go
+++ b/cmd/vendor/google.golang.org/grpc/stream.go
@@ -149,7 +149,7 @@ func NewClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth
t, put, err = cc.getTransport(ctx, gopts)
if err != nil {
// TODO(zhaoq): Probably revisit the error handling.
- if _, ok := err.(*rpcError); ok {
+ if _, ok := err.(rpcError); ok {
return nil, err
}
if err == errConnClosing {
@gyuho We probably need to figure out the caller of newFramer function.
Actually there's not much difference between handleStreams
calls, when benchmark runs with --total=1000000
. Both do about the same calls to handleStreams
.
Did run grpc-benchmarks but both have about the same performance...
More test data...
master-cpu.pb.gz master-cpu.pdf
master-heap.pb.gz master-heap.pdf
Server side profiling data doesn't explain much...
Reproducible with single node cluster:
master branch: 14247.8641
test branch: 28999.8329
Even single node doesn't seem to show much difference in profiling...
master-cpu.pb.gz master-cpu.pdf
master-heap.pb.gz master-heap.pdf
It's likely regression from Go, plus gRPC change.
master branch, 1 node cluster
go 1.6 : 15339.9802
go 1.6.3 : 14402.0303
go 1.7 : 15605.9624
go master : 16836.9583
test branch, 1 node cluster
go 1.6 : 31372.9589
go 1.6.3 : 27012.3428
go 1.7 : 17090.2322
go master : 16823.8769
master branch, 3 node cluster
go 1.6 : 48845.3885
go 1.6.3 : 49220.5884
go 1.7 : 53688.9684
go master : 53887.7715
test branch, 3 node cluster
go 1.6 : 92374.1203
go 1.6.3 : 86653.9443
go 1.7 : 47207.7163
go master : 52174.3944
With Anthony's patch, this seems resolved, but we still see slight regression with *rpcError
go1.6.3 anthony's patch
92371.0151
go1.6.3 anthony's patch + grpc error patch to rpcError (non pointer)
97333.2936
go1.7rc3 anthony's patch
92872.4881
go1.7rc3 anthony's patch + grpc error patch to rpcError (non pointer)
96982.5091
Patch
commit bb9ab45b754d491d4ccbbe282e29d5755a69502e
Author: Gyu-Ho Lee <gyuhox@gmail.com>
Date: Thu Nov 17 00:52:27 2016 -0800
vendor: revert error type change in gRPC
diff --git a/cmd/vendor/google.golang.org/grpc/call.go b/cmd/vendor/google.golang.org/grpc/call.go
index 772c817..789aafc 100644
--- a/cmd/vendor/google.golang.org/grpc/call.go
+++ b/cmd/vendor/google.golang.org/grpc/call.go
@@ -173,7 +173,7 @@ func invoke(ctx context.Context, method string, args, reply interface{}, cc *Cli
t, put, err = cc.getTransport(ctx, gopts)
if err != nil {
// TODO(zhaoq): Probably revisit the error handling.
- if _, ok := err.(*rpcError); ok {
+ if _, ok := err.(rpcError); ok {
return err
}
if err == errConnClosing || err == errConnUnavailable {
diff --git a/cmd/vendor/google.golang.org/grpc/rpc_util.go b/cmd/vendor/google.golang.org/grpc/rpc_util.go
index a25eaa8..a35e546 100644
--- a/cmd/vendor/google.golang.org/grpc/rpc_util.go
+++ b/cmd/vendor/google.golang.org/grpc/rpc_util.go
@@ -342,7 +342,7 @@ type rpcError struct {
desc string
}
-func (e *rpcError) Error() string {
+func (e rpcError) Error() string {
return fmt.Sprintf("rpc error: code = %d desc = %s", e.code, e.desc)
}
@@ -352,7 +352,7 @@ func Code(err error) codes.Code {
if err == nil {
return codes.OK
}
- if e, ok := err.(*rpcError); ok {
+ if e, ok := err.(rpcError); ok {
return e.code
}
return codes.Unknown
@@ -364,7 +364,7 @@ func ErrorDesc(err error) string {
if err == nil {
return ""
}
- if e, ok := err.(*rpcError); ok {
+ if e, ok := err.(rpcError); ok {
return e.desc
}
return err.Error()
@@ -376,7 +376,7 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
if c == codes.OK {
return nil
}
- return &rpcError{
+ return rpcError{
code: c,
desc: fmt.Sprintf(format, a...),
}
@@ -385,32 +385,32 @@ func Errorf(c codes.Code, format string, a ...interface{}) error {
// toRPCErr converts an error into a rpcError.
func toRPCErr(err error) error {
switch e := err.(type) {
- case *rpcError:
+ case rpcError:
return err
case transport.StreamError:
- return &rpcError{
+ return rpcError{
code: e.Code,
desc: e.Desc,
}
case transport.ConnectionError:
- return &rpcError{
+ return rpcError{
code: codes.Internal,
desc: e.Desc,
}
default:
switch err {
case context.DeadlineExceeded:
- return &rpcError{
+ return rpcError{
code: codes.DeadlineExceeded,
desc: err.Error(),
}
case context.Canceled:
- return &rpcError{
+ return rpcError{
code: codes.Canceled,
desc: err.Error(),
}
case ErrClientConnClosing:
- return &rpcError{
+ return rpcError{
code: codes.FailedPrecondition,
desc: err.Error(),
}
diff --git a/cmd/vendor/google.golang.org/grpc/server.go b/cmd/vendor/google.golang.org/grpc/server.go
index e0bb187..9eabe90 100644
--- a/cmd/vendor/google.golang.org/grpc/server.go
+++ b/cmd/vendor/google.golang.org/grpc/server.go
@@ -578,7 +578,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
}
if err != nil {
switch err := err.(type) {
- case *rpcError:
+ case rpcError:
if err := t.WriteStatus(stream, err.code, err.desc); err != nil {
grpclog.Printf("grpc: Server.processUnaryRPC failed to write status %v", err)
}
@@ -596,7 +596,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
if err := checkRecvPayload(pf, stream.RecvCompress(), s.opts.dc); err != nil {
switch err := err.(type) {
- case *rpcError:
+ case rpcError:
if err := t.WriteStatus(stream, err.code, err.desc); err != nil {
grpclog.Printf("grpc: Server.processUnaryRPC failed to write status %v", err)
}
@@ -637,7 +637,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
}
reply, appErr := md.Handler(srv.server, stream.Context(), df, s.opts.unaryInt)
if appErr != nil {
- if err, ok := appErr.(*rpcError); ok {
+ if err, ok := appErr.(rpcError); ok {
statusCode = err.code
statusDesc = err.desc
} else {
@@ -723,7 +723,7 @@ func (s *Server) processStreamingRPC(t transport.ServerTransport, stream *transp
appErr = s.opts.streamInt(srv.server, ss, info, sd.Handler)
}
if appErr != nil {
- if err, ok := appErr.(*rpcError); ok {
+ if err, ok := appErr.(rpcError); ok {
ss.statusCode = err.code
ss.statusDesc = err.desc
} else if err, ok := appErr.(transport.StreamError); ok {
diff --git a/cmd/vendor/google.golang.org/grpc/stream.go b/cmd/vendor/google.golang.org/grpc/stream.go
index 4681054..22c7895 100644
--- a/cmd/vendor/google.golang.org/grpc/stream.go
+++ b/cmd/vendor/google.golang.org/grpc/stream.go
@@ -150,7 +150,7 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth
t, put, err = cc.getTransport(ctx, gopts)
if err != nil {
// TODO(zhaoq): Probably revisit the error handling.
- if _, ok := err.(*rpcError); ok {
+ if _, ok := err.(rpcError); ok {
return nil, err
}
if err == errConnClosing || err == errConnUnavailable {
From v3.0.0, v3.0.1, v3.0.2 to v3.0.3, we get 2x regression in S-reads.
# compiled with Go 1.6.3
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 endpoint health
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 put foo bar
benchmark --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 --conns=100 --clients=1000 range foo --consistency=s --total=100000
v3.0.2 with Go 1.6.3
Summary:
Total: 0.6376 secs.
Slowest: 0.0701 secs.
Fastest: 0.0002 secs.
Average: 0.0051 secs.
Stddev: 0.0055 secs.
Requests/sec: 156829.9965
v3.0.3 with Go 1.6.3
Summary:
Total: 1.7543 secs.
Slowest: 0.3146 secs.
Fastest: 0.0002 secs.
Average: 0.0172 secs.
Stddev: 0.0222 secs.
Requests/sec: 57002.8634
Master branch with Go tip
Summary:
Total: 1.1500 secs.
Slowest: 0.1776 secs.
Fastest: 0.0002 secs.
Average: 0.0111 secs.
Stddev: 0.0140 secs.
Requests/sec: 86957.1846
https://github.com/grpc/grpc-go/issues/771 https://github.com/grpc/grpc-go/commit/ffdfb592e8ac60f3f9ca4889661eabfe09c11628 https://github.com/coreos/etcd/commits/master?after=N38ZsAMfnAqv4q7Ci2%2BQGTEfUvkrMTExOQ%3D%3D
To summarize serializable performance
Go 1.6.3 + current master branch: BAD (76104.0368 qps)
Go 1.7.3 + current master branch: BAD (60 ~ 70K qps)
Go tip + current master branch: BAD (86957.1846 qps)
Go 1.6.3 + current master branch + patch: OK (119141.2140 qps -- still slower than 150K of v3.0.2)
Go 1.7.3 + current master branch + patch: BAD (60 ~ 70K qps)
Go tip + current master branch + patch: BAD (87482.1327 qps)
So it's Go + gRPC that introduced the regression.
/cc @heyitsanthony @xiang90
Not much progress so far...
Just to add a note
diff --git a/mvcc/kvstore.go b/mvcc/kvstore.go
index ff038bc..fb80808 100644
--- a/mvcc/kvstore.go
+++ b/mvcc/kvstore.go
@@ -17,6 +17,7 @@ package mvcc
import (
"encoding/binary"
"errors"
+ "fmt"
"math"
"math/rand"
"sync"
@@ -154,7 +155,6 @@ func (s *store) Range(key, end []byte, ro RangeOptions) (r *RangeResult, err err
id := s.TxnBegin()
kvs, count, rev, err := s.rangeKeys(key, end, ro.Limit, ro.Rev, ro.Count)
s.txnEnd(id)
-
rangeCounter.Inc()
r = &RangeResult{
@@ -177,10 +177,12 @@ func (s *store) DeleteRange(key, end []byte) (n, rev int64) {
}
func (s *store) TxnBegin() int64 {
+ now := time.Now()
s.mu.Lock()
s.currentRev.sub = 0
s.tx = s.b.BatchTx()
s.tx.Lock()
+ fmt.Println("[DEBUG] BatchTx took", time.Since(now))
Same code with etcd master branch shows latency spikes around *store.Mutex.Lock
.
Go 1.6.3 takes only 200 ~ 300ns, while Go 1.7.3 takes 50 ~ 150ms. Initial requests in both case take 200 ~ 300ns, but later requests in Go 1.7.3 slow down. Seems like some request congestion before this transaction code path is triggering lock contentions in later requests. This only happens with more than 10,000 read requests with 1,000 concurrent clients (100 TCP connections).
Lock contention is expected...
go test -v -run TestConcurrentRange
func TestConcurrentRange(t *testing.T) {
var i fakeConsistentIndex
be, tmpPath := backend.NewDefaultTmpBackend()
s := NewStore(be, &lease.FakeLessor{}, &i)
defer cleanup(s, be, tmpPath)
s.Put([]byte("foo"), []byte("bar"), lease.NoLease)
var wg sync.WaitGroup
wg.Add(1000)
for i := 0; i < 1000; i++ {
go func() {
defer wg.Done()
s.Range([]byte("foo"), nil, RangeOptions{Rev: 1})
}()
}
wg.Wait()
}
Go 1.6.3 and 1.7.3 have same latency spikes in this test.
@gyuho After taking a closer look at the 1.7.3 pprof data, I think we did something wrong.
For example, it shows
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*kVClient).Txn
0.03s(0.045%)
of 9.55s(14.32%)
Why do we have txn request in the data???
Also it shows
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.FiltersFromRequest
0.04s(0.06%)
of 9.25s(13.87%)
This func can only be called in watch.
So the pprof result for 1.7.3 is clearly problematic.
@xiang90 All I did was
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 put foo bar
ETCDCTL_API=3 ${GOPATH}/src/github.com/coreos/etcd/bin/etcdctl --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 get foo --consistency=s
benchmark --endpoints=10.240.0.44:2379,10.240.0.45:2379,10.240.0.46:2379 --conns=100 --clients=1000 range foo --consistency=s --total=10000
I could have done wrong. I will rerun these tests and make sure those calls do not show up.
@gyuho Well. Probably we did something wrong in our client code (FiltersFromRequest is called from watch), if you have time probably you can switch to raw gRPC to do benchmark. it might make a difference?
Or you might point pprof to the wrong binary: 1.7.3 pprof data but points to 1.6.3 binary. The pprof of 1.7.3 seems really broken.
Yeah I should try raw grpc calls. And all clients were in Go 1.7.3, just different Go versions in server side. I should try clients in different Go versions as well.
v3.0.2 with Go 1.6.3
Summary:
Total: 0.6376 secs.
Slowest: 0.0701 secs.
Fastest: 0.0002 secs.
Average: 0.0051 secs.
Stddev: 0.0055 secs.
Requests/sec: 156829.9965
Can you try to build 3.0.2 with go 1.7 and do a benchmark?
@xiang90
v3.0.2 with Go 1.7.3 (client is also Go 1.7.3 with the same git sha)
Summary:
Total: 0.1427 secs.
Slowest: 0.1071 secs.
Fastest: 0.0002 secs.
Average: 0.0120 secs.
Stddev: 0.0133 secs.
Requests/sec: 70080.2853
Probably a good start is to focus only on 3.0.2 and see what is the difference between go 16 and 17 since that is the only change.
@gyuho can you get a pprof data for 3.0.2 with go16 and go17? (try --total=500000 for a longer period of data)
etcdv3.0.2-go1.6.2-5M-serializable-reads.pdf pprof.10.240.0.44:2379.samples.cpu.001.pb.gz
# etcd v3.0.2 with Go 1.6.3
# benchmark compiled with Go 1.7.3 in master branch
benchmark --endpoints=10.240.0.44:2379 --conns=100 --clients=1000 range foo --consistency=s --total=5000000
bench with serializable range
5000000 / 5000000 Booooooooooooooooooooooooooooooo! 100.00%1m57s
Summary:
Total: 117.6307 secs.
Slowest: 0.5534 secs.
Fastest: 0.0002 secs.
Average: 0.0235 secs.
Stddev: 0.0336 secs.
Requests/sec: 42505.9236
Response time histogram:
0.0002 [1] |
0.0555 [4153659] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.1108 [713134] |∎∎∎∎∎∎
0.1661 [110394] |∎
0.2215 [17593] |
0.2768 [4068] |
0.3321 [975] |
0.3874 [140] |
0.4427 [27] |
0.4981 [6] |
0.5534 [3] |
Latency distribution:
10% in 0.0004 secs.
25% in 0.0005 secs.
50% in 0.0010 secs.
75% in 0.0355 secs.
90% in 0.0652 secs.
95% in 0.0936 secs.
99% in 0.1504 secs.
etcdv3.0.2-go1.7.3-2M-serializable-reads.pdf pprof.etcd.10.240.0.44:2379.samples.cpu.001.pb.gz
# etcd v3.0.2 with Go 1.7.3
# benchmark compiled with Go 1.7.3 in master branch
benchmark --endpoints=10.240.0.44:2379 --conns=100 --clients=1000 range foo --consistency=s --total=2000000
bench with serializable range
2000000 / 2000000 Booooooooooooooooooooooooooooooo! 100.00%1m40s
Summary:
Total: 100.3941 secs.
Slowest: 1.0669 secs.
Fastest: 0.0002 secs.
Average: 0.0502 secs.
Stddev: 0.0668 secs.
Requests/sec: 19921.4873
Response time histogram:
0.0002 [1] |
0.1069 [1604052] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.2135 [332074] |∎∎∎∎∎∎∎∎
0.3202 [53733] |∎
0.4269 [8303] |
0.5335 [1254] |
0.6402 [480] |
0.7469 [90] |
0.8535 [11] |
0.9602 [1] |
1.0669 [1] |
Latency distribution:
10% in 0.0003 secs.
25% in 0.0003 secs.
50% in 0.0432 secs.
75% in 0.0635 secs.
90% in 0.1227 secs.
95% in 0.1805 secs.
99% in 0.2977 secs.
@gyuho
Hmm... Is there a reason that we do 5M reads for go 16 but 2M for go 17? We really should make sure all the variables are the same except go versions.
If you take a look at the go 17 pprof data, there is no range related funcs there. The most significant one is lease operation. This is strange.
@gyuho I did a pprof on go 1.7.2 + 3.0.2 myself. The benchmark result is similar. But the pprof result is clean and is different from yours. I think you did something wrong for pprof.
Here is the correct way to do it:
You need to make sure you provide the current binary. If you provide wrong binary, the result it meaningless.
@gyuho Can you create a new github issue for this problem? This issue is too messy and contains wrong pprof information. Let's start from fresh?
Also reported at https://github.com/grpc/grpc-go/issues/771. Before and after grpc dependency update https://github.com/coreos/etcd/pull/5948, there's huge performance drop in serializable range requests:
https://github.com/coreos/etcd/commit/07ed4da2ffe871444c83be63e508d2adf4c87175:
https://github.com/coreos/etcd/commit/d9a8a326dfe4691cf90d2c4d2674e3602de812d8
https://github.com/coreos/etcd/commit/d8f0ef0e800d86e6343b073c406b7fe0e2cb3940