etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
46.77k stars 9.64k forks source link

grpc-proxy stops sending watch events #18231

Open pzhuchkov opened 4 days ago

pzhuchkov commented 4 days ago

Bug report criteria

What happened?

There is a configuration client<->grpc-proxy<->etcd. In some cases, grpc-proxy may stop sending events to the client, while the stream is not closed. Such cases may occur if the etcd server has sent the cancel status to watch stream

What did you expect to happen?

Connections need to be reopened to etcd

How can we reproduce it (as minimally and precisely as possible)?

Add a timer to etcd, as a result of which the Cancel status is sent a few seconds after the start

    errc := make(chan error, 2)
    go func() {
        time.Sleep(2 * time.Second)
        errc <- status.Error(codes.Canceled, "my custom error!!!")
        fmt.Println("sent my custom error!!!")
    }()

Start etcd server

/etcd \
--name testing \
--listen-client-urls https://127.0.0.1:2379 \
--advertise-client-urls https://127.0.0.1:2379 \
--client-cert-auth=1 \
--cert-file ./cfssl/server.pem \
--key-file ./cfssl/server-key.pem \
--trusted-ca-file ./cfssl/ca.pem \
--debug

Start grpc-proxy

etcd grpc-proxy start \
--endpoints https://127.0.0.1:2379 \
--listen-addr 127.0.0.1:12379 \
--advertise-client-urls https://127.0.0.1:2379 \
--cert /cfssl/ClientCert.crt \
--key /cfssl/ClientKey.key \
--cacert /cfssl/CaCertChain.crt \
--cert-file ./cfssl/server.pem \
--key-file ./cfssl/server-key.pem \
--trusted-ca-file ./cfssl/ca.pem \
--debug

Start client

etcdctl --endpoints="127.0.0.1:12379 " watch /test --prefix  --debug

After 20 seconds, the proxy will close the connection to the server, but the client will continue to wait

Anything else we need to know?

At this code, the watch client will close and exit the function, while defer will close the channel, but nothing else will happen

Maybe try reopen?

for {
    if cctx.Err() != nil {
        return
    }
    wch := wp.cw.Watch(cctx, w.wr.key, opts...)

    for wr := range wch {
        wb.bcast(wr)
        update(wb)
    }
}

Etcd version (please run commands below)

```console $ etcd --version # paste output here $ etcdctl version # paste output here ```

Etcd configuration (command line flags or environment variables)

default

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

```console ```console $ etcd Version: 3.4.28 Git SHA: Not provided (use ./build instead of go build) Go Version: go1.22.0 Go OS/Arch: darwin/arm64 $ etcdctl version: 3.4.19 API version: 3.4 ```

Relevant log output

No response