etcd - panic: unmarshal should never fail

allenporter commented 3 years ago

Have a 3 node etcd setup and see a panic every 2-5 days, which moves between nodes. I typically rebuild the node, then it will happen again after a few days. I am able to reproduce this in my environment, but not sure that I have been able to reproduce it with a fresh set up -- very likely a misconfiguration or problem on my end and i would love any help or suggestions for further diagnoses.

It seems like after there is a read failure (more on that below) a partially written proto causes a crash. Note the proto field that is invalid may be different each time this happens.

Version information is the same for all nodes:

etcd Version: 3.4.14
Git SHA: 8a03d2e96
Go Version: go1.12.17
Go OS/Arch: linux/amd64

Here is an example from the host that crashed:

Mar 25 16:39:00 gilbert etcd[615]: store.index: compact 67845142
Mar 25 16:39:00 gilbert etcd[615]: finished scheduled compaction at 67845142 (took 26.872156ms)
Mar 25 16:40:58 gilbert etcd[615]: lost the TCP streaming connection with peer 41059887617477c2 (stream MsgApp v2 reader)
Mar 25 16:40:58 gilbert etcd[615]: failed to read 41059887617477c2 on stream MsgApp v2 (failed to parse type 110 in msgappv2 stream)
Mar 25 16:40:58 gilbert etcd[615]: peer 41059887617477c2 became inactive (message send to peer failed)
Mar 25 16:40:58 gilbert etcd[615]: unmarshal should never fail (proto: wrong wireType = 1 for field Time)
Mar 25 16:40:58 gilbert etcd[615]: panic: unmarshal should never fail (proto: wrong wireType = 1 for field Time)
Mar 25 16:40:58 gilbert etcd[615]: goroutine 199 [running]:
Mar 25 16:40:58 gilbert etcd[615]: github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc0001407c0, 0x10b5513, 0x20, 0xc007545058, 0x1, 0x1)
Mar 25 16:40:58 gilbert etcd[615]:         /home/ANT.AMAZON.COM/leegyuho/go/pkg/mod/github.com/coreos/pkg@v0.0.0-20160727233714-3ac0863d7acf/capnslog/pkg_logger.go:75 +0x135
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/pkg/pbutil.MustUnmarshal(0x126dda0, 0xc003d7d600, 0xc004b8adc0, 0x2b8, 0x2c0)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/pkg/pbutil/pbutil.go:42 +0xbb
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal(0xc000208600, 0xc0075454d0)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:2198 +0xaba
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/etcdserver.(*EtcdServer).apply(0xc000208600, 0xc00c42fcb0, 0x2, 0x2, 0xc000096e60, 0x42f5bf, 0x10ecdc8, 0xc00004c500)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:2138 +0x596
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries(0xc000208600, 0xc000096e60, 0xc007f13300)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:1390 +0xd4
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll(0xc000208600, 0xc000096e60, 0xc007f13300)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:1114 +0x88
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8(0x128dbc0, 0xc002eee700)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:1059 +0x3c
Mar 25 16:40:58 gilbert etcd[615]: go.etcd.io/etcd/pkg/schedule.(*fifo).run(0xc0001b47e0)
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/pkg/schedule/schedule.go:157 +0xdb
Mar 25 16:40:58 gilbert etcd[615]: created by go.etcd.io/etcd/pkg/schedule.NewFIFOScheduler
Mar 25 16:40:58 gilbert etcd[615]:         /tmp/etcd-release-3.4.14/etcd/release/etcd/pkg/schedule/schedule.go:70 +0x13d
Mar 25 16:40:58 gilbert systemd[1]: etcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 25 16:40:58 gilbert systemd[1]: etcd.service: Failed with result 'exit-code'.

This is what the cluster looks like, with the status after the crash:

$ etcdctl member list
41059887617477c2, started, cfg03, http://10.10.21.26:2380, http://10.10.21.26:2379,http://127.0.1.1:2379
485efb4d5d9381f6, started, cfg02, http://10.10.21.25:2380, http://10.10.21.25:2379,http://127.0.1.1:2379
a92007feedce6099, started, cfg01, http://10.10.21.27:2380, http://10.10.21.27:2379,http://127.0.1.1:2379
$ etcdctl endpoint status
Failed to get the status of endpoint cfg03.prod:2379 (context deadline exceeded)
cfg01.prod:2379, a92007feedce6099, 3.4.14, 85 MB, false, 792, 5956274
cfg02.prod:2379, 485efb4d5d9381f6, 3.4.14, 85 MB, true, 792, 5956274

Each node is an ubuntu VM, running on a separate physical host. Below are the config files with environment variables, and the systemd config.

cfg01.prod$ cat /etc/default/etcd 
ETCD_NAME="cfg01"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-d"
ETCD_INITIAL_CLUSTER="cfg02=http://10.10.21.25:2380,cfg01=http://10.10.21.27:2380,cfg03=http://10.10.21.26:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.10.21.27:2380"
ETCD_LISTEN_PEER_URLS="http://10.10.21.27:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.10.21.27:2379,http://127.0.1.1:2379"
ETCD_LISTEN_CLIENT_URLS="http://10.10.21.27:2379,http://127.0.1.1:2379"

cfg02.prod$ cat /etc/default/etcd 
ETCD_NAME="cfg02"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-d"
ETCD_INITIAL_CLUSTER="cfg03=http://10.10.21.26:2380,cfg02=http://10.10.21.25:2380,cfg01=http://10.10.21.27:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.10.21.25:2380"
ETCD_LISTEN_PEER_URLS="http://10.10.21.25:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.10.21.25:2379,http://127.0.1.1:2379"
ETCD_LISTEN_CLIENT_URLS="http://10.10.21.25:2379,http://127.0.1.1:2379"

$ cat /etc/default/etcd 
ETCD_NAME="cfg03"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-d"
ETCD_INITIAL_CLUSTER="cfg02=http://10.10.21.25:2380,cfg01=http://10.10.21.27:2380,cfg03=http://10.10.21.26:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.10.21.26:2380"
ETCD_LISTEN_PEER_URLS="http://10.10.21.26:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.10.21.26:2379,http://127.0.1.1:2379"
ETCD_LISTEN_CLIENT_URLS="http://10.10.21.26:2379,http://127.0.1.1:2379"

cfg01

$ sudo systemctl cat etcd2
# /lib/systemd/system/etcd.service
[Unit]
Description=etcd - highly-available key value store
Documentation=https://github.com/coreos/etcd
Documentation=man:etcd
After=network.target
Wants=network-online.target

[Service]
Environment=DAEMON_ARGS=
Environment=ETCD_NAME=%H
Environment=ETCD_DATA_DIR=/var/lib/etcd/default
EnvironmentFile=-/etc/default/%p
Type=notify
User=etcd
PermissionsStartOnly=true
#ExecStart=/bin/sh -c "GOMAXPROCS=$(nproc) /usr/bin/etcd $DAEMON_ARGS"
ExecStart=/usr/bin/etcd $DAEMON_ARGS
Restart=on-abnormal
#RestartSec=10s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd2.service

cfg02

$ sudo systemctl cat etcd2
# /lib/systemd/system/etcd.service
[Unit]
Description=etcd - highly-available key value store
Documentation=https://github.com/coreos/etcd
Documentation=man:etcd
After=network.target
Wants=network-online.target

[Service]
Environment=DAEMON_ARGS=
Environment=ETCD_NAME=%H
Environment=ETCD_DATA_DIR=/var/lib/etcd/default
EnvironmentFile=-/etc/default/%p
Type=notify
User=etcd
PermissionsStartOnly=true
#ExecStart=/bin/sh -c "GOMAXPROCS=$(nproc) /usr/bin/etcd $DAEMON_ARGS"
ExecStart=/usr/bin/etcd $DAEMON_ARGS
Restart=on-abnormal
#RestartSec=10s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd2.service

cfg03

$ sudo systemctl cat etcd2
# /lib/systemd/system/etcd.service
[Unit]
Description=etcd - highly-available key value store
Documentation=https://github.com/coreos/etcd
Documentation=man:etcd
After=network.target
Wants=network-online.target

[Service]
Environment=DAEMON_ARGS=
Environment=ETCD_NAME=%H
Environment=ETCD_DATA_DIR=/var/lib/etcd/default
EnvironmentFile=-/etc/default/%p
Type=notify
User=etcd
PermissionsStartOnly=true
#ExecStart=/bin/sh -c "GOMAXPROCS=$(nproc) /usr/bin/etcd $DAEMON_ARGS"
ExecStart=/usr/bin/etcd $DAEMON_ARGS
Restart=on-abnormal
#RestartSec=10s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd2.service

The log files for each of these hosts are attached. You may also see ETCD_ELECTION_TIMEOUT and ETCD_HEARTBEAT_INTERVAL were added as an attempt to see if that happened to keep the connections alive longer, which didn't change anything.

I also notice that the cluster has many warnings probably due to I/O delay, probably a catalyst for this:

read-only range request "key:\"/XXX" " with result "range_response_count:1 size:565" took too long (5.673816881s) to execute

Each time this happens, the crash looks very similar with a lost the TCP streaming connection with peer then failed to read XXX on stream then panic: unmarshal should never fail. The VM disks run on SSD, though don't have tons of free disk I/O which may be a bit constrained -- there are also warnings about slow reads, which may be the culprit for why requests timeout, though I didn't see an obvious correlation at the time of the crash. My assumption here is that my network/hosts are flaky and/or slow, which should be addressed, but also that this should be resilient to this type of failure unless it is a misconfiguration.

I would love any help on further ways to understand the internals of what is happening here. (e.g. happy to run commands on the db to inspect the database, and also not an expert at diagnosing disk I/O problems) or if it makes sense I can run a different custom build and add additional logging, but haven't tried that yet.

Full logs attached. cfg01.prod.log.txt cfg02.prod.log.txt cfg03.prod.log.txt

allenporter commented 3 years ago

I looked closer at the code and see this isn't necessarily a lost TCP connection due to a flaky network, but due to failing to parse a message on the wire from here: https://github.com/etcd-io/etcd/blob/b7e5f5bc1213c5d6a5ad37d269f1e9ad9086f53a/etcdserver/api/rafthttp/msgappv2_codec.go#L245

io.ReadFull returns a message but the first byte is not the right type.

allenporter commented 3 years ago

When this happened again this morning, i got a different type that was invalid.

Apr 09 02:11:57 ruby etcd[614]: lost the TCP streaming connection with peer a92007feedce6099 (stream MsgApp v2 reader)
Apr 09 02:11:57 ruby etcd[614]: failed to read a92007feedce6099 on stream MsgApp v2 (failed to parse type 210 in msgappv2 stream)
Apr 09 02:11:57 ruby etcd[614]: peer a92007feedce6099 became inactive (message send to peer failed)
Apr 09 02:11:57 ruby etcd[614]: unmarshal should never fail (proto: wrong wireType = 5 for field Dir)

allenporter commented 3 years ago

This happened 2 or 3 more times in the last 12 days after removing the member, wiping the database, and re-adding the members. Each time it could be a different member that would fail with a similar result. For example, today's crash looks similar:

Apr 22 17:43:56 gilbert etcd[5930]: lost the TCP streaming connection with peer fdb2ed28f2a14137 (stream MsgApp v2 reader)
Apr 22 17:43:56 gilbert etcd[5930]: failed to read fdb2ed28f2a14137 on stream MsgApp v2 (failed to parse type 97 in msgappv2 stream)
Apr 22 17:43:56 gilbert etcd[5930]: peer fdb2ed28f2a14137 became inactive (message send to peer failed)
Apr 22 17:43:56 gilbert etcd[5930]: unmarshal should never fail (proto: wrong wireType = 6 for field Sorted)
Apr 22 17:43:56 gilbert etcd[5930]: panic: unmarshal should never fail (proto: wrong wireType = 6 for field Sorted)
Apr 22 17:43:56 gilbert etcd[5930]: goroutine 197 [running]:
Apr 22 17:43:56 gilbert etcd[5930]: github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc00010b660, 0x10bdffe, 0x20, 0xc001257058, 0x1, 0x1)
Apr 22 17:43:56 gilbert etcd[5930]:         /root/go/pkg/mod/github.com/coreos/pkg@v0.0.0-20160727233714-3ac0863d7acf/capnslog/pkg_logger.go:75 +0x135
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/pkg/pbutil.MustUnmarshal(0x1277b60, 0xc0001be420, 0xc003466000, 0x2ac, 0x2c0)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/pkg/pbutil/pbutil.go:42 +0xbb
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal(0xc000200600, 0xc0012574d0)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/etcdserver/server.go:2198 +0xaba
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/etcdserver.(*EtcdServer).apply(0xc000200600, 0xc007007680, 0x2, 0x2, 0xc0001665a0, 0x42f5bf, 0x10f5c20, 0xc00004c500)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/etcdserver/server.go:2138 +0x596
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries(0xc000200600, 0xc0001665a0, 0xc0099e1000)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/etcdserver/server.go:1390 +0xd4
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll(0xc000200600, 0xc0001665a0, 0xc0099e1000)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/etcdserver/server.go:1114 +0x88
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8(0x12979a0, 0xc00007a640)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/etcdserver/server.go:1059 +0x3c
Apr 22 17:43:56 gilbert etcd[5930]: go.etcd.io/etcd/pkg/schedule.(*fifo).run(0xc0001ab140)
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/pkg/schedule/schedule.go:157 +0xdb
Apr 22 17:43:56 gilbert etcd[5930]: created by go.etcd.io/etcd/pkg/schedule.NewFIFOScheduler
Apr 22 17:43:56 gilbert etcd[5930]:         /tmp/etcd-release-3.4.15/etcd/release/etcd/pkg/schedule/schedule.go:70 +0x13d
Apr 22 17:43:56 gilbert systemd[1]: etcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

I'm considering learning go, figuring out debug statements to add to a custom build to aid in debugging, though very open to other suggestions.

ptabor commented 3 years ago

If member crashes, does the problem persists after restarting member (i.e. the entry deserialization fails again during server restart) or it crashes and comes back to healthy ?

ptabor commented 3 years ago

It's this boolean field: https://github.com/etcd-io/etcd/blob/bd4f8e2b6c6a0bdcd52f4593f68d9f2415ab5293/api/etcdserverpb/etcdserver.proto#L24 wire-types in protos should be 0-5: https://developers.google.com/protocol-buffers/docs/encoding so '6' is very surprising.

But maybe the real problem starts earlier, during the RAFT:

Apr 22 17:43:56 gilbert etcd[5930]: lost the TCP streaming connection with peer fdb2ed28f2a14137 (stream MsgApp v2 reader)
Apr 22 17:43:56 gilbert etcd[5930]: failed to read fdb2ed28f2a14137 on stream MsgApp v2 (failed to parse type 97 in msgappv2 stream)

https://github.com/etcd-io/etcd/blob/bd4f8e2b6c6a0bdcd52f4593f68d9f2415ab5293/server/etcdserver/api/rafthttp/msgappv2_codec.go#L30-L32 shows there are only 3 supported messages and not 97.

I suspect something is intercepting the over-network communication between peers. Maybe worth to try running the setup with mtls for peer<->peer comunication to guarantee integrity.

For debugging io.Reader: https://github.com/etcd-io/etcd/blob/bd4f8e2b6c6a0bdcd52f4593f68d9f2415ab5293/server/etcdserver/api/rafthttp/msgappv2_codec.go#L163 wrapped with TeeReader (https://golang.org/pkg/io/#TeeReader) might be a way to go.

allenporter commented 3 years ago

The problem persists if I restart the member.

Good point on integrity end to end, I didn't consider that and only was thinking about a bug in sending/receiving in etcd itself. These hosts are within a VM so very possible, so I can explore that.

Maybe running tcpdump, I could capture the corruption happening as well.

I am only not using tls because it's a little more complex to automate setup with ansible and I had not considered the value from an integrity perspective.

Thank you for the debugging suggestions I'll give this a try and report back.

allenporter commented 3 years ago

I got a packet capture of this happening, and was able to confirm your theory about network corruption. I dug into the response and i see during the middle of the AppEntries the response side see:

0180   02 76 31 7a 00 8a 01 b3 02 0a 07 6b 75 62 65 6c   .v1z.......kubel
0190   65 74 12 06 55 70 64 61 74 65 1a 16 63 6f 6f 72   et..Update..coor
01a0   64 69 6e 61 74 69 6f 6e 2e 6b 38 73 2e 69 6f 2f   dination.k8s.io/
01b0   76 31 22 08 08 89 a4 ed 81 06 10 00 32 08 46 69   v1".........2.Fi
01c0   65 6c 64 73 56 31 3a f3 01 0a f0 01 7b 22 66 3a   eldsV1:.....{"f:
01d0   6d 65 74 61 64 61 74 61 22 3a 7b 22 66 3a 6f 77   metadata":{"f:ow
01e0   6e 65 72 52 65 66 65 72 65 6e 63 65 73 22 3a 7b   nerReferences":{
01f0   22 2e 22 3a 7b 7d 2c 22 6b 3a 7b 5c 22 75 69 64   ".":{},"k:{\"uid
0200   5c 22 3a 5c 22 31 33 37 66 32 62 31 33 2d 32 63   \":\"137f2b13-2c
0210   36 31 2d 34 34 65 39 2d 62 33 38 37 2d 64 33 62   61-44e9-b387-d3b
0220   61 64 32 31 32 64 65 65 30 5c 22 7d 22 3a 7b 22   ad212dee0\"}":{"
0230   2e 22 3a 7b 7d 2c 22 66 3a 61 70 69 56 65 72 73   .":{},"f:apiVers
0240   69 6f 6e 22 3a 7b 7d 2c 22 66 3a 6b 69 6e 64 22   ion":{},"f:kind"
0250   3a 7b 7d 2c 22 66 3a 6e 61 6d 65 22 3a 7b 7d 2c   :{},"f:name":{},
0260   22 66 3a 75 69 64 22 3a 7b 7d 7d 7d 7d 2c 22 66   "f:uid":{}}}},"f
0270   3a 73 70 65 63 22 3a 7b 22 66 3a 68 6f 6c 64 65   :spec":{"f:holde
0280   72 49 64 65 6e 74 69 74 79 22 3a 7b 7d 2c 22 66   rIdentity":{},"f
0290   3a 6c 65 61 73 65 44 75 72 61 74 69 6f 6e 53 65   :leaseDurationSe
02a0   63 6f 6e 64 73 22 3a 7b 7d 2c 22 66 3a 72 65 6e   conds":{},"f:ren
02b0   65 77 54 69 6d 65 22 3a 7b 7d 7d 7d 12 16 0a 05   ewTime":{}}}....
02c0   6f 72 62 69 74 10 28 22 0b 08 ba ec b1 84 06 10   orbit.("........
02d0   9e 8e db 1d 1a 00 22 00 1a 2a 0a 28 0a 26 2f 72   ......"..*.(.&/r
02e0   65 67 69 73 74 72 79 2f 6c 65 61 73 65 73 2f 6b   egistry/leases/k
02f0   75 62 65 2d 6e 6f 64 65 2d 6c 65 61 73 65 2f 6f   ube-node-lease/o
0300   72 62 69 74 a2 06 0a 08 c1 b1 d0 b0 cd 9f de b0   rbit............
0310   3c 00 00 00 00 01 6a 6b 96 01 00 00 00 00 00 00   <.....jk........
0320   00 01 00 00 00 00 00 00 02 b9 08 00 10 cf 33 18   ..............3.
0330   9b d7 a9 0b 22 ac 05 32 9c 05 0a 2f 10 02 1a 26   ...."..2.../...&
0340   2f 72 65 67 69 73 74 72 79 2f 6c 65 61 73 65 73   /registry/leases

And the receiving side sees:

0180   02 76 31 7a 00 8a 01 b3 02 0a 07 6b 75 62 65 6c   .v1z.......kubel
0190   65 74 12 06 55 70 64 61 74 65 1a 16 63 6f 6f 72   et..Update..coor
01a0   64 69 6e 61 74 69 6f 6e 2e 6b 38 73 2e 69 6f 2f   dination.k8s.io/
01b0   76 31 22 08 08 89 a4 ed 81 06 10 00 32 08 46 69   v1".........2.Fi
01c0   65 6c 64 73 56 31 3a f3 01 0a f0 01 7b 22 66 3a   eldsV1:.....{"f:
01d0   6d 65 74 61 64 61 74 61 22 3a 7b 22 66 3a 6f 77   metadata":{"f:ow
01e0   6e 65 72 52 65 66 65 72 65 6e 63 65 73 22 3a 7b   nerReferences":{
01f0   22 2e 22 3a 7b 7d 2c 22 6b 3a 7b 5c 22 75 69 64   ".":{},"k:{\"uid
0200   5c 22 3a 5c 22 31 33 37 66 32 62 31 33 2d 32 63   \":\"137f2b13-2c
0210   36 31 2d 34 34 65 39 2d 62 33 38 37 2d 64 33 62   61-44e9-b387-d3b
0220   61 64 32 31 32 64 65 65 30 5c 22 7d 22 3a 7b 22   ad212dee0\"}":{"
0230   2e 22 3a 7b 7d 2c 22 66 3a 61 70 69 56 65 72 73   .":{},"f:apiVers
0240   65 6c 64 73 56 31 3a f3 01 0a f0 01 7b 22 66 3a   eldsV1:.....{"f:
0250   6d 65 74 61 64 61 74 61 22 3a 7b 22 66 3a 6f 77   metadata":{"f:ow
0260   6e 65 72 52 65 66 65 72 65 6e 63 65 73 22 3a 7b   nerReferences":{
0270   22 2e 22 3a 7b 7d 2c 22 6b 3a 7b 5c 22 75 69 64   ".":{},"k:{\"uid
0280   5c 22 3a 5c 22 31 33 37 66 32 62 31 33 2d 32 63   \":\"137f2b13-2c
0290   36 31 2d 34 34 65 39 2d 62 33 38 37 2d 64 33 62   61-44e9-b387-d3b
02a0   61 64 32 31 32 64 65 65 30 5c 22 7d 22 3a 7b 22   ad212dee0\"}":{"
02b0   2e 22 3a 7b 7d 2c 22 66 3a 61 70 69 56 65 72 73   .":{},"f:apiVers
02c0   69 6f 6e 22 3a 7b 7d 2c 22 66 3a 6b 69 6e 64 22   ion":{},"f:kind"
02d0   3a 7b 7d 2c 22 66 3a 6e 61 6d 65 22 3a 7b 7d 2c   :{},"f:name":{},
02e0   22 66 3a 75 69 64 22 3a 7b 7d 7d 7d 7d 2c 22 66   "f:uid":{}}}},"f
02f0   3a 73 70 65 63 22 3a 7b 22 66 3a 68 6f 6c 64 65   :spec":{"f:holde
0300   72 49 64 65 6e 74 69 74 79 22 3a 7b 7d 2c 22 66   rIdentity":{},"f
0310   3a 6c 65 61 73 65 44 75 72 61 74 69 6f 6e 53 65   :leaseDurationSe
0320   63 6f 6e 64 73 22 3a 7b 7d 2c 22 66 3a 72 65 6e   conds":{},"f:ren
0330   65 77 54 69 6d 65 22 3a 7b 7d 7d 7d 12 16 0a 05   ewTime":{}}}....
0340   6f 72 62 69 74 10 28 22 0b 08 ba ec b1 84 06 10   orbit.("........

The chunk at 0240 was repeating the packet at 0200.

Closing, since I don't expect etcd to handle this gracefully. Thanks for your assistance.

etcd-io / etcd

etcd - panic: unmarshal should never fail #12850