etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.67k stars 9.75k forks source link

frequent elections causing request timeout #3466

Closed durzo closed 9 years ago

durzo commented 9 years ago

I have a 3 node etcd v2.1.2 cluster in AWS spread over 3 availability zones in the same region.

Roughly once a day, I am seeing elections occurring, causing all pending requests to be dropped (timeout).

Ping time between each node is never more than 1.5ms, lowest being 0.6ms CPU is never more than 3%, Disk I/O is almost idle (high read, very low write cluster)

I am at a loss to explain why these elections are occurring, I have installed smokeping on the cluster and verified that there is no latency spikes or packet loss between any nodes.

I am starting all my nodes with --heartbeat-interval=100 --election-timeout=2000

Here are the logs from the last failure:

$ etcdctl member list
bd3f4e5879e1deae: name=etcd1
14a9317600de48a0: name=etcd2
bf4365627a658392: name=etcd3

etcd1 log

2015/09/9 06:54:36 raft: bd3f4e5879e1deae is starting a new election at term 314
2015/09/9 06:54:36 raft: bd3f4e5879e1deae became candidate at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae received vote from bd3f4e5879e1deae at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae [logterm: 314, index: 273748357] sent vote request to 14a9317600de48a0 at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae [logterm: 314, index: 273748357] sent vote request to bf4365627a658392 at term 315
2015/09/9 06:54:36 raft: raft.node: bd3f4e5879e1deae lost leader bf4365627a658392 at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae received vote from 14a9317600de48a0 at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae [q:2] has received 2 votes and 0 vote rejections
2015/09/9 06:54:36 raft: bd3f4e5879e1deae became leader at term 315
2015/09/9 06:54:36 raft: raft.node: bd3f4e5879e1deae elected leader bd3f4e5879e1deae at term 315
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]

etcd2 log

2015/09/9 06:54:36 raft: 14a9317600de48a0 [term: 314] received a MsgVote message with higher term from bd3f4e5879e1deae [term: 315]
2015/09/9 06:54:36 raft: 14a9317600de48a0 became follower at term 315
2015/09/9 06:54:36 raft: 14a9317600de48a0 [logterm: 314, index: 273748357, vote: 0] voted for bd3f4e5879e1deae [logterm: 314, index: 273748357] at term 315
2015/09/9 06:54:36 raft: raft.node: 14a9317600de48a0 lost leader bf4365627a658392 at term 315
2015/09/9 06:54:36 raft: raft.node: 14a9317600de48a0 elected leader bd3f4e5879e1deae at term 315
2015/09/9 06:54:37 raft: 14a9317600de48a0 [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:40 etcdhttp: etcdserver: request timed out, possibly due to previous leader failure
2015/09/9 06:54:40 etcdhttp: etcdserver: request timed out, possibly due to previous leader failure

etcd3 log

2015/09/9 06:54:36 raft: bf4365627a658392 [term: 314] received a MsgVote message with higher term from bd3f4e5879e1deae [term: 315]
2015/09/9 06:54:36 raft: bf4365627a658392 became follower at term 315
2015/09/9 06:54:36 raft: bf4365627a658392 [logterm: 314, index: 273748367, vote: 0] rejected vote from bd3f4e5879e1deae [logterm: 314, index: 273748357] at term 315
2015/09/9 06:54:36 raft: raft.node: bf4365627a658392 lost leader bf4365627a658392 at term 315
2015/09/9 06:54:36 raft: found conflict at index 273748358 [existing term: 314, conflicting term: 315]
2015/09/9 06:54:36 raft: replace the unstable entries from index 273748358
2015/09/9 06:54:36 raft: raft.node: bf4365627a658392 elected leader bd3f4e5879e1deae at term 315

Any ideas?

xiang90 commented 9 years ago

Roughly once a day, I am seeing elections occurring, causing all pending requests to be dropped (timeout).

Yes. This is expected. We improved logging around this in 2.1. It will log as the timeout is due to election. We can buffer some request and resend in the future.

I am at a loss to explain why these elections are occurring, I have installed smokeping on the cluster and verified that there is no latency spikes or packet loss between any nodes.

I am not sure why. We have an integration cluster running on ec2 m3.large machines. It accepts 2000 small writes/second with default timeout. Leader election happens once a few days due to network spikes or CPU issues (yea... this is the cloud). We think etcd server should be fine.

If you can reproduce the frequent leader election you were seeing, we can help you more. Probably you can try to give etcd more CPUs to mitigate this.

xiang90 commented 9 years ago

This is also duplicate with https://github.com/coreos/etcd/issues/3380.

I am closing this in favor of #3380

xiang90 commented 9 years ago

@durzo BTW, do you know how large is your snapshot?

xiang90 commented 9 years ago

Also if you can provide logging before 06:54:36 of all three member, then I can have more context.

durzo commented 9 years ago

snapshots are currently 8.2M in size, I am also using --snapshot-count=10000

logs prior to 06:54:36 on all nodes:

etcd1:

2015/09/9 04:44:00 etcdserver: start to snapshot (applied: 273672104, lastsnap: 273662103)
2015/09/9 04:44:00 etcdserver: saved snapshot at index 273672104
2015/09/9 04:44:00 etcdserver: compacted raft log at 273667104
2015/09/9 04:44:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f2453.snap successfully
2015/09/9 05:02:16 etcdserver: start to snapshot (applied: 273682105, lastsnap: 273672104)
2015/09/9 05:02:16 etcdserver: saved snapshot at index 273682105
2015/09/9 05:02:16 etcdserver: compacted raft log at 273677105
2015/09/9 05:02:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f4b64.snap successfully
2015/09/9 05:17:42 etcdserver: start to snapshot (applied: 273692108, lastsnap: 273682105)
2015/09/9 05:17:42 etcdserver: saved snapshot at index 273692108
2015/09/9 05:17:42 etcdserver: compacted raft log at 273687108
2015/09/9 05:17:50 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f7275.snap successfully
2015/09/9 05:35:41 etcdserver: start to snapshot (applied: 273702109, lastsnap: 273692108)
2015/09/9 05:35:41 etcdserver: saved snapshot at index 273702109
2015/09/9 05:35:41 etcdserver: compacted raft log at 273697109
2015/09/9 05:35:50 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f9986.snap successfully
2015/09/9 05:53:41 etcdserver: start to snapshot (applied: 273712110, lastsnap: 273702109)
2015/09/9 05:53:41 etcdserver: saved snapshot at index 273712110
2015/09/9 05:53:41 etcdserver: compacted raft log at 273707110
2015/09/9 05:53:50 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104fc097.snap successfully
2015/09/9 06:09:13 etcdserver: start to snapshot (applied: 273722111, lastsnap: 273712110)
2015/09/9 06:09:14 etcdserver: saved snapshot at index 273722111
2015/09/9 06:09:14 etcdserver: compacted raft log at 273717111
2015/09/9 06:09:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104fe7a8.snap successfully
2015/09/9 06:26:58 etcdserver: start to snapshot (applied: 273732112, lastsnap: 273722111)
2015/09/9 06:26:59 etcdserver: saved snapshot at index 273732112
2015/09/9 06:26:59 etcdserver: compacted raft log at 273727112
2015/09/9 06:27:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-0000000010500eb9.snap successfully
2015/09/9 06:44:36 etcdserver: start to snapshot (applied: 273742113, lastsnap: 273732112)
2015/09/9 06:44:36 etcdserver: saved snapshot at index 273742113
2015/09/9 06:44:36 etcdserver: compacted raft log at 273737113
2015/09/9 06:44:50 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000105035cc.snap successfully

etcd2:

2015/09/9 04:58:13 etcdserver: start to snapshot (applied: 273679614, lastsnap: 273669613)
2015/09/9 04:58:13 etcdserver: saved snapshot at index 273679614
2015/09/9 04:58:13 etcdserver: compacted raft log at 273674614
2015/09/9 04:58:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f41a9.snap successfully
2015/09/9 05:13:44 etcdserver: start to snapshot (applied: 273689615, lastsnap: 273679614)
2015/09/9 05:13:44 etcdserver: saved snapshot at index 273689615
2015/09/9 05:13:44 etcdserver: compacted raft log at 273684615
2015/09/9 05:14:13 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f68ba.snap successfully
2015/09/9 05:31:09 etcdserver: start to snapshot (applied: 273699616, lastsnap: 273689615)
2015/09/9 05:31:10 etcdserver: saved snapshot at index 273699616
2015/09/9 05:31:10 etcdserver: compacted raft log at 273694616
2015/09/9 05:31:13 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f8fcb.snap successfully
2015/09/9 05:49:33 etcdserver: start to snapshot (applied: 273709617, lastsnap: 273699616)
2015/09/9 05:49:33 etcdserver: saved snapshot at index 273709617
2015/09/9 05:49:33 etcdserver: compacted raft log at 273704617
2015/09/9 05:49:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104fb6dc.snap successfully
2015/09/9 06:05:17 etcdserver: start to snapshot (applied: 273719618, lastsnap: 273709617)
2015/09/9 06:05:18 etcdserver: saved snapshot at index 273719618
2015/09/9 06:05:18 etcdserver: compacted raft log at 273714618
2015/09/9 06:05:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104fdded.snap successfully
2015/09/9 06:22:26 etcdserver: start to snapshot (applied: 273729619, lastsnap: 273719618)
2015/09/9 06:22:26 etcdserver: saved snapshot at index 273729619
2015/09/9 06:22:26 etcdserver: compacted raft log at 273724619
2015/09/9 06:22:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000105004fe.snap successfully
2015/09/9 06:34:20 wal: segmented wal file /var/cache/etcd/state/member/wal/0000000000000320-000000001050e189.wal is created
2015/09/9 06:34:43 fileutil: purged file /var/cache/etcd/state/member/wal/000000000000031b-0000000010308a2c.wal successfully
2015/09/9 06:40:28 etcdserver: start to snapshot (applied: 273739620, lastsnap: 273729619)
2015/09/9 06:40:28 etcdserver: saved snapshot at index 273739620
2015/09/9 06:40:28 etcdserver: compacted raft log at 273734620
2015/09/9 06:40:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-0000000010502c0f.snap successfully

etcd3:

2015/09/9 04:52:35 etcdserver: start to snapshot (applied: 273676449, lastsnap: 273666448)
2015/09/9 04:52:35 etcdserver: saved snapshot at index 273676449
2015/09/9 04:52:35 etcdserver: compacted raft log at 273671449
2015/09/9 04:52:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f354c.snap successfully
2015/09/9 05:08:58 etcdserver: start to snapshot (applied: 273686450, lastsnap: 273676449)
2015/09/9 05:08:58 etcdserver: saved snapshot at index 273686450
2015/09/9 05:08:58 etcdserver: compacted raft log at 273681450
2015/09/9 05:09:13 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f5c5d.snap successfully
2015/09/9 05:25:37 etcdserver: start to snapshot (applied: 273696451, lastsnap: 273686450)
2015/09/9 05:25:37 etcdserver: saved snapshot at index 273696451
2015/09/9 05:25:37 etcdserver: compacted raft log at 273691451
2015/09/9 05:25:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104f836e.snap successfully
2015/09/9 05:44:01 etcdserver: start to snapshot (applied: 273706452, lastsnap: 273696451)
2015/09/9 05:44:01 etcdserver: saved snapshot at index 273706452
2015/09/9 05:44:01 etcdserver: compacted raft log at 273701452
2015/09/9 05:44:13 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104faa7f.snap successfully
2015/09/9 06:00:25 etcdserver: start to snapshot (applied: 273716453, lastsnap: 273706452)
2015/09/9 06:00:25 etcdserver: saved snapshot at index 273716453
2015/09/9 06:00:25 etcdserver: compacted raft log at 273711453
2015/09/9 06:00:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104fd190.snap successfully
2015/09/9 06:16:52 etcdserver: start to snapshot (applied: 273726454, lastsnap: 273716453)
2015/09/9 06:16:52 etcdserver: saved snapshot at index 273726454
2015/09/9 06:16:52 etcdserver: compacted raft log at 273721454
2015/09/9 06:17:13 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104ff8a1.snap successfully
2015/09/9 06:35:01 etcdserver: start to snapshot (applied: 273736455, lastsnap: 273726454)
2015/09/9 06:35:01 etcdserver: saved snapshot at index 273736455
2015/09/9 06:35:01 etcdserver: compacted raft log at 273731455
2015/09/9 06:35:13 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-0000000010501fb2.snap successfully
2015/09/9 06:51:36 etcdserver: start to snapshot (applied: 273746456, lastsnap: 273736455)
2015/09/9 06:51:37 etcdserver: saved snapshot at index 273746456
2015/09/9 06:51:37 etcdserver: compacted raft log at 273741456
2015/09/9 06:51:43 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000105046c3.snap successfully
durzo commented 9 years ago

I am running my nodes on t2.medium instance classes. While these are "burstable" CPU, the usage never goes over 3%, and never eats into the burstable CPU credits. They do have 2vCPU and I am exporting GOMAXPROCS="2" in my upstart init script.

I ran the cluster on m3.mediums for a period, but it did not solve the issue.

One thing I thought of was these issues seemed to come into play when Metrics was added. is it possible to disable metrics? we don't use them at all.

xiang90 commented 9 years ago

@durzo I checked our cluster. The election rate is about once per week or longer. Our cluster is under super high load (90%+ CPU usage + O(100) IO/s)

One thing I thought of was these issues seemed to come into play when Metrics was added. is it possible to disable metrics? we don't use them at all.

No. I do not think they are related. Is it possible that you have a slow disk? And a fsync takes more than 1 second or so?

xiang90 commented 9 years ago

@durzo From the log, I did not see any network issue either. It does not relate to snapshot either...

/cc @yichengq Any thought?

xiang90 commented 9 years ago

@durzo Wait... I do not think you gave me all the logs on node1. Or there is something strange happened on node1...

xiang90 commented 9 years ago

@durzo Please provide me continuous log for node1 from 2015/09/9 06:09:13 to 2015/09/9 07:10:00

durzo commented 9 years ago

I am using magnetic EBS storage. General Purpose (SSD) is not really an option as it is slower than magnetic unless over a certain size (100GB) and our disks are only 10GB in size. Cost would become an issue here for 3 x 100GB SSD.

I can test disk speed if you would like, but Cloudwatch reports almost no IO load and 0 queue length for those disks.

xiang90 commented 9 years ago

@durzo See my previous comments.

durzo commented 9 years ago

etcd1 complete logs:

2015/09/9 06:09:13 etcdserver: start to snapshot (applied: 273722111, lastsnap: 273712110)
2015/09/9 06:09:14 etcdserver: saved snapshot at index 273722111
2015/09/9 06:09:14 etcdserver: compacted raft log at 273717111
2015/09/9 06:09:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000104fe7a8.snap successfully
2015/09/9 06:26:58 etcdserver: start to snapshot (applied: 273732112, lastsnap: 273722111)
2015/09/9 06:26:59 etcdserver: saved snapshot at index 273732112
2015/09/9 06:26:59 etcdserver: compacted raft log at 273727112
2015/09/9 06:27:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-0000000010500eb9.snap successfully
2015/09/9 06:44:36 etcdserver: start to snapshot (applied: 273742113, lastsnap: 273732112)
2015/09/9 06:44:36 etcdserver: saved snapshot at index 273742113
2015/09/9 06:44:36 etcdserver: compacted raft log at 273737113
2015/09/9 06:44:50 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000105035cc.snap successfully
2015/09/9 06:54:36 raft: bd3f4e5879e1deae is starting a new election at term 314
2015/09/9 06:54:36 raft: bd3f4e5879e1deae became candidate at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae received vote from bd3f4e5879e1deae at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae [logterm: 314, index: 273748357] sent vote request to 14a9317600de48a0 at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae [logterm: 314, index: 273748357] sent vote request to bf4365627a658392 at term 315
2015/09/9 06:54:36 raft: raft.node: bd3f4e5879e1deae lost leader bf4365627a658392 at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae received vote from 14a9317600de48a0 at term 315
2015/09/9 06:54:36 raft: bd3f4e5879e1deae [q:2] has received 2 votes and 0 vote rejections
2015/09/9 06:54:36 raft: bd3f4e5879e1deae became leader at term 315
2015/09/9 06:54:36 raft: raft.node: bd3f4e5879e1deae elected leader bd3f4e5879e1deae at term 315
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 07:00:31 etcdserver: start to snapshot (applied: 273752114, lastsnap: 273742113)
2015/09/9 07:00:32 etcdserver: saved snapshot at index 273752114
2015/09/9 07:00:32 etcdserver: compacted raft log at 273747114
2015/09/9 07:00:50 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-0000000010505cdd.snap successfully
2015/09/9 07:18:11 etcdserver: start to snapshot (applied: 273762115, lastsnap: 273752114)
2015/09/9 07:18:11 etcdserver: saved snapshot at index 273762115
2015/09/9 07:18:11 etcdserver: compacted raft log at 273757115
2015/09/9 07:18:20 fileutil: purged file /var/cache/etcd/state/member/snap/000000000000013a-00000000105083ee.snap successfully
xiang90 commented 9 years ago

@durzo

It looks like node1 <-> node3 did queue up some messages:

These message should be received before the leader was elected. /cc @yichengq

2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
2015/09/9 06:54:37 raft: bd3f4e5879e1deae [term: 315] ignored a MsgApp message with lower term from bf4365627a658392 [term: 314]
durzo commented 9 years ago

related: my python-etcd library reported this error 7 times at approx 06:54:38

Raft Internal Error : etcdserver: request timed out, possibly due to previous leader failure
durzo commented 9 years ago

python-etcd library connects to a local etcd running in proxy mode, which is then connecting to the cluster. all within the same AWS A-Z.

xiang90 commented 9 years ago

@durzo From the logging, I think there was a network delay (maybe caused by go runtime too). I am not sure why...

yichengq commented 9 years ago

@durzo Could you share the metrics info of all members with us through the metrics endpoint /metrics? It includes some stats in 2.1 and may be helpful

durzo commented 9 years ago
etcd1:~$ curl localhost:4001/metrics
# HELP etcd_rafthttp_message_sent_latency_microseconds message sent latency distributions.
# TYPE etcd_rafthttp_message_sent_latency_microseconds summary
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="message"} 1210
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="message"} 32
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="msgappv2",quantile="0.5"} 21
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="msgappv2",quantile="0.9"} 32
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="msgappv2",quantile="0.99"} 48
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="msgappv2"} 1.4486415e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgApp",remoteID="14a9317600de48a0",sendingType="msgappv2"} 660947
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="message"} 251
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="message"} 8
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="msgappv2",quantile="0.5"} 20
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="msgappv2",quantile="0.9"} 32
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="msgappv2",quantile="0.99"} 48
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="msgappv2"} 1.400149e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgApp",remoteID="bf4365627a658392",sendingType="msgappv2"} 664344
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgAppResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgAppResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgAppResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgAppResp",remoteID="14a9317600de48a0",sendingType="message"} 8.0833299e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgAppResp",remoteID="14a9317600de48a0",sendingType="message"} 3.352544e+06
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgAppResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgAppResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgAppResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgAppResp",remoteID="bf4365627a658392",sendingType="message"} 2.42030698e+08
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgAppResp",remoteID="bf4365627a658392",sendingType="message"} 9.863802e+06
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeat",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} 41
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeat",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} 62
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeat",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} 79
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgHeartbeat",remoteID="14a9317600de48a0",sendingType="message"} 1.1599212e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgHeartbeat",remoteID="14a9317600de48a0",sendingType="message"} 302828
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeat",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} 29
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeat",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} 59
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeat",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} 75
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgHeartbeat",remoteID="bf4365627a658392",sendingType="message"} 9.954222e+06
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgHeartbeat",remoteID="bf4365627a658392",sendingType="message"} 302828
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeatResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeatResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeatResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgHeartbeatResp",remoteID="14a9317600de48a0",sendingType="message"} 4.5543254e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgHeartbeatResp",remoteID="14a9317600de48a0",sendingType="message"} 1.647159e+06
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeatResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeatResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgHeartbeatResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgHeartbeatResp",remoteID="bf4365627a658392",sendingType="message"} 1.39125426e+08
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgHeartbeatResp",remoteID="bf4365627a658392",sendingType="message"} 4.875664e+06
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="message",quantile="0.5"} 27
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="message",quantile="0.9"} 38
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="message",quantile="0.99"} 54
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="message"} 2.122617e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="message"} 823164
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="msgappv2",quantile="0.5"} 27
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="msgappv2",quantile="0.9"} 51
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="msgappv2",quantile="0.99"} 75
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="msgappv2"} 2.2961133e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgLinkHeartbeat",remoteID="0",sendingType="msgappv2"} 823164
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgProp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgProp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgProp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgProp",remoteID="14a9317600de48a0",sendingType="message"} 9.070046e+06
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgProp",remoteID="14a9317600de48a0",sendingType="message"} 384526
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgProp",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgProp",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgProp",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgProp",remoteID="bf4365627a658392",sendingType="message"} 2.7770875e+07
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgProp",remoteID="bf4365627a658392",sendingType="message"} 1.152338e+06
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVote",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVote",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVote",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgVote",remoteID="14a9317600de48a0",sendingType="message"} 124
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgVote",remoteID="14a9317600de48a0",sendingType="message"} 2
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVote",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVote",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVote",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgVote",remoteID="bf4365627a658392",sendingType="message"} 90
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgVote",remoteID="bf4365627a658392",sendingType="message"} 2
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVoteResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVoteResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVoteResp",remoteID="14a9317600de48a0",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgVoteResp",remoteID="14a9317600de48a0",sendingType="message"} 37
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgVoteResp",remoteID="14a9317600de48a0",sendingType="message"} 1
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVoteResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.5"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVoteResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.9"} NaN
etcd_rafthttp_message_sent_latency_microseconds{msgType="MsgVoteResp",remoteID="bf4365627a658392",sendingType="message",quantile="0.99"} NaN
etcd_rafthttp_message_sent_latency_microseconds_sum{msgType="MsgVoteResp",remoteID="bf4365627a658392",sendingType="message"} 26
etcd_rafthttp_message_sent_latency_microseconds_count{msgType="MsgVoteResp",remoteID="bf4365627a658392",sendingType="message"} 1
# HELP etcd_server_file_descriptors_used_total The total number of file descriptors used.
# TYPE etcd_server_file_descriptors_used_total gauge
etcd_server_file_descriptors_used_total 45
# HELP etcd_server_pending_proposal_total The total number of pending proposals.
# TYPE etcd_server_pending_proposal_total gauge
etcd_server_pending_proposal_total 0
# HELP etcd_server_proposal_durations_milliseconds The latency distributions of committing proposal.
# TYPE etcd_server_proposal_durations_milliseconds summary
etcd_server_proposal_durations_milliseconds{quantile="0.5"} 14
etcd_server_proposal_durations_milliseconds{quantile="0.9"} 36
etcd_server_proposal_durations_milliseconds{quantile="0.99"} 73
etcd_server_proposal_durations_milliseconds_sum 4.7608014e+07
etcd_server_proposal_durations_milliseconds_count 1.610812e+06
# HELP etcd_server_proposal_failed_total The total number of failed proposals.
# TYPE etcd_server_proposal_failed_total counter
etcd_server_proposal_failed_total 17
# HELP etcd_snapshot_save_total_durations_microseconds The total latency distributions of save called by snapshot.
# TYPE etcd_snapshot_save_total_durations_microseconds summary
etcd_snapshot_save_total_durations_microseconds{quantile="0.5"} NaN
etcd_snapshot_save_total_durations_microseconds{quantile="0.9"} NaN
etcd_snapshot_save_total_durations_microseconds{quantile="0.99"} NaN
etcd_snapshot_save_total_durations_microseconds_sum 1.1599921e+07
etcd_snapshot_save_total_durations_microseconds_count 616
# HELP etcd_store_expires_total Total number of expired keys.
# TYPE etcd_store_expires_total counter
etcd_store_expires_total 1.240333e+06
# HELP etcd_store_reads_total Total number of reads action by (get/getRecursive), local to this member.
# TYPE etcd_store_reads_total counter
etcd_store_reads_total{action="get"} 1.43378762e+08
etcd_store_reads_total{action="getRecursive"} 269
# HELP etcd_store_watch_requests_total Total number of incoming watch requests (new or reestablished).
# TYPE etcd_store_watch_requests_total counter
etcd_store_watch_requests_total 5908
# HELP etcd_store_watchers Count of currently active watchers.
# TYPE etcd_store_watchers gauge
etcd_store_watchers 3
# HELP etcd_store_writes_total Total number of writes (e.g. set/compareAndDelete) seen by this member.
# TYPE etcd_store_writes_total counter
etcd_store_writes_total{action="create"} 1.246572e+06
etcd_store_writes_total{action="delete"} 1.241512e+06
etcd_store_writes_total{action="set"} 2.307081e+06
# HELP etcd_wal_fsync_durations_microseconds The latency distributions of fsync called by wal.
# TYPE etcd_wal_fsync_durations_microseconds summary
etcd_wal_fsync_durations_microseconds{quantile="0.5"} 5640
etcd_wal_fsync_durations_microseconds{quantile="0.9"} 19466
etcd_wal_fsync_durations_microseconds{quantile="0.99"} 48035
etcd_wal_fsync_durations_microseconds_sum 9.537284487e+10
etcd_wal_fsync_durations_microseconds_count 1.161696e+07
# HELP etcd_wal_last_index_saved The index of the last entry saved by wal.
# TYPE etcd_wal_last_index_saved gauge
etcd_wal_last_index_saved 2.7399061e+08
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.009729904000000001
go_gc_duration_seconds{quantile="0.25"} 0.010158817
go_gc_duration_seconds{quantile="0.5"} 0.010437408
go_gc_duration_seconds{quantile="0.75"} 0.010681896000000001
go_gc_duration_seconds{quantile="1"} 0.019196297
go_gc_duration_seconds_sum 117.93109278
go_gc_duration_seconds_count 11650
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 125
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} NaN
http_request_duration_microseconds_sum{handler="prometheus"} 0
http_request_duration_microseconds_count{handler="prometheus"} 0
# HELP http_request_size_bytes The HTTP request sizes in bytes.
# TYPE http_request_size_bytes summary
http_request_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_request_size_bytes_sum{handler="prometheus"} 0
http_request_size_bytes_count{handler="prometheus"} 0
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_response_size_bytes_sum{handler="prometheus"} 0
http_response_size_bytes_count{handler="prometheus"} 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 22798.55
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 46
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.14737152e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.44109174045e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.8796544e+08
yichengq commented 9 years ago

Maybe it is the problem of fsync on disk. Its 99% latency is 48ms, and if its max value >2s, it could be a problem.

durzo commented 9 years ago

Can I ask what disk types you have in your AWS integration cluster?

yichengq commented 9 years ago

@xiang90 It could be better to have metrics exposed some maximal value that has been observed. So we could know its maximal possible value.

We use n1-standard-1 machines with 200G Standard persistent disk.

xiang90 commented 9 years ago

@yichengq @durzo Yea. The cluster is on GCE. It is similar with m3.large with spin disk.