Open gyuho opened 7 years ago
Did you try tip?
I did; tip has slightly better performance than Go 1.7.3, but still much slower than Go 1.6.3.
Looked at the pb.gz profiles in coreos/etcd#6876. Slightly confusing because the Go 1.7 profile ran 2X as long as the Go 1.6 profile so the total times are not directly comparable. However, in Go 1.6 the scheduler ran for about 2s, while in Go 1.7 the scheduler ran for about 20s, so about 5X more. That probably has something to do with it.
http2 probably does not have anything to do with it. From the profile it seems clear that you are running a vendored copy of http2, not the one from the Go distribution itself.
Go 1.6:
Go 1.7:
We might be able to fix this in Go 1.8 if the fix is simple.
/cc @aclements @RLH
I gave different duration to test the same workloads, but they are confusing for profiling.
So I collected pprof
data with same duration 30-second:
./benchmark-master-go1.7.3 --endpoints=localhost:2379 --conns=100 --clients=1000 range foo --consistency=s --total=2000000
# Go 1.6.3
Total: 55.5685 secs.
Requests/sec: 35991.5985
# Go 1.7.3
Summary:
Total: 105.6422 secs.
Requests/sec: 18931.8332
And here are pprof
binaries if that helps.
Go 1.6.3
wget https://storage.googleapis.com/etcd/tip/etcd-v3.0.2-go1.6.3
etcd-v3.0.2-go1.6.3-serializable-reads-00.pdf
pprof.localhost:2379.samples.cpu.001.pb.gz
Go 1.7.3
wget https://storage.googleapis.com/etcd/tip/etcd-v3.0.2-go1.7.3
etcd-v3.0.2-go1.7.3-serializable-reads-00.pdf
pprof.etcd-v3.0.2-go1.7.3.localhost:2379.samples.cpu.001.pb.gz
Thanks a lot!
/cc @smarterclayton @wojtek-t
I haven't seen any update on this bug about anything Go 1.8-related, so I'm going to kick this down the road further.
Please try Go 1.8 and file bugs about any performance problems, thanks!
In the future, please start your trials of new Go versions earlier. Testing the move from Go 1.6 to Go 1.7 doesn't help us much when we're just about done with Go 1.8.
@bradfitz Thanks!
I ran the same tests with go1.6.4
vs go1.8beta1
, and got same results
# all single-node cluster
go1.6.4 took 49.8130 secs.
go1.8beta1 took 100.3238 secs.
Here are benchmark binaries and profiled data:
go1.6.4 etcd-v3.0.2-go1.6.4-2M-serializable-reads-2016121501.pdf pprof.localhost-2379.samples.cpu.001.pb.gz
go1.8beta1 etcd-v3.0.2-go1.8beta1-2M-serializable-reads-2016121501.pdf pprof.etcd-v3.0.2-go1.8beta1.localhost-2379.samples.cpu.001.pb.gz
wget https://storage.googleapis.com/etcd/tip/etcd-v3.0.2-go1.6.4
wget https://storage.googleapis.com/etcd/tip/etcd-v3.0.2-go1.8beta1
wget https://storage.googleapis.com/etcd/tip/etcdctl-master-go1.8beta1
wget https://storage.googleapis.com/etcd/tip/benchmark-master-go1.8beta1
sudo chmod +x ./etcd-v3.0.2-go1.6.4 && ./etcd-v3.0.2-go1.6.4 --version
sudo chmod +x ./etcd-v3.0.2-go1.8beta1 && ./etcd-v3.0.2-go1.8beta1 --version
sudo chmod +x ./etcdctl-master-go1.8beta1 && ./etcdctl-master-go1.8beta1 --version
sudo chmod +x ./benchmark-master-go1.8beta1 && ./benchmark-master-go1.8beta1 help
Leaving for @aclements to decide what to do here, and when.
I've figured out, whats going on. I was able to reproduce this issue only with GOMAXPROCS >= 3 This started happening on https://github.com/golang/go/commit/a6fb2aede7c8d47b4d913eb83fa45bbeca76c433 Which enables new ssa compiler. Because of that, stack frames are now shorter, so stack growth happens while locks are taken. Previously due lo larger stackframes stack growth was happening outside of critical section. I've verified this by enabling blocking profiler, and indeed in bad case much more time is spent waiting on locks: 214.42mins of 215.35mins total (99.57%) in bad case 2376s of 2376.23s total ( 100%) in good case From looking at profiles, it seems like the guilty lock is the one in coreos/etcd/etcdserver.(*authApplierV3).Apply, but I didn't verify this.
I've verified that following approaches make this regression disappear: 1) Bumping default stack size to 8kb or changing growth factor to 4 2) Putting large array on stack to force stack growth earlier. E. g:
+var q [160]int
+
func metricsUnaryInterceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) {
service, method := splitMethodName(info.FullMethod)
receivedCounter.WithLabelValues(service, method).Inc()
-
+ var ccc [160]int
start := time.Now()
resp, err = handler(ctx, req)
if err != nil {
failedCounter.WithLabelValues(service, method, grpc.Code(err).String()).Inc()
}
handlingDuration.WithLabelValues(service, method).Observe(time.Since(start).Seconds())
-
+ q = ccc
return resp, err
}
However both of them look like hacks to me. As for real solution, I think that something like @aclements suggestion from https://github.com/golang/go/issues/18138 about allocating larger stack for functions that are known to require it, is the right way.
@TocarIP
From looking at profiles, it seems like the guilty lock is the one in coreos/etcd/etcdserver.(*authApplierV3).Apply, but I didn't verify this.
I can confirm this. We also suspect that there is a more serious lock contention triggered by some runtime changes. So we change the locking mechanism, the problem disappeared.
But as you mentioned, this issue is probably still worth fixing.
CL https://golang.org/cl/45142 mentions this issue.
@TocarIP thanks for the excellent analysis.
If you can still reproduce this, I'd be curious what the effect of CL 45142 is. I'm really not sure what the right answer here is, but that hack might mitigate things from both this issue and #18138.
I've been thinking for a while that the latency of stack growth can be problematic, but this is an interesting case where it's not just a problem with large stacks. The old segmented stacks approach had less of this problem, but it had other anomalies. I wonder if it's possible to efficiently but incrementally move a stack.
I've tested CL 45142, and it doesn't help
At my request, @TocarIP also checked CL 43150 (thanks!). It's not a deep fix, but it does at least help a little--from 13.4s to 12.4s.
I’ve done a bit of work on stack copying this cycle. Much of it is in. https://golang.org/cl/109001 is under review. I have one more I hope to mail soon, but no promises. If anyone can still reproduce readily, I’d be curious to hear whether / how much tip has improved, and whether the linked and future CLs help.
From briefly reading this issue, it sounds like this might be resolved with https://github.com/golang/go/commit/016d7552138077741a9c3fdadc73c0179f5d3ff7 which should be included in Go 1.19.
Hello,
etcd observed significant performance regression with Go 1.7.3.
Details follow or please visit https://github.com/coreos/etcd/issues/6876.
What version of Go are you using (
go version
)?Go 1.6.3 and Go 1.7.3
What operating system and processor architecture are you using (
go env
)?Google Cloud VM with 8 vCPUs, 16 GB memory, ubuntu-1610
What did you do?
Build etcd binaries, others
Build etcd binaries in different Go versions with etcd commit https://github.com/coreos/etcd/commit/faeeb2fc7514c5caf7a9a0cc03ac9ee2ff94438b.
Go version is the only difference!
Or
Run single-node cluster, benchmark
Go 1.6.3 + master branch Go 1.7.3 client
Go 1.7.3 + master branch Go 1.7.3 client
What did you expect to see?
etcd with Go 1.6.3 shows:
What did you see instead?
etcd with Go 1.7.3 is 2x slower:
We expected Go 1.7.3 to be comparable or better.
Here's our profiling data
Note: etcd uses old
golang.org/x/net/http2
but same behavior happens with latesthttp2
; see https://github.com/coreos/etcd/issues/6876#issuecomment-262292981.Could anybody help us debug this around Go runtime? Please let me know if more information is needed.
Thanks a lot in advance!