Open lavacat opened 1 year ago
this is related to auth implementation cc @mitake @ahrtr
cc @ptabor
Thanks @lavacat , probably I'll be able to check sometime Wednesday, sorry for keeping you waiting.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
What happened?
I've discovered this issue when debugging watch not retrying in openWatchClient
isHaltErr was returning true on ErrLeaderChanged
This is only applicable when auth is enabled. w.remote.Watch(w.ctx, w.callOpts...) ->
streamClientInterceptor
-> getToken ->Auth.Authenticate
->toErr
toErr
will convert ErrGRPCLeaderChanged to ErrLeaderChanged,isHaltErr(ErrLeaderChanged)
returnstrue
When auth is disabled,
Auth.Authenticate
isn't called andtoErr
conversion doesn't happen.What did you expect to happen?
expected watch to retry
How can we reproduce it (as minimally and precisely as possible)?
See unit and integration test in the PR.
Note, test is using ErrGRPCNoLeader, but in production we've observed ErrLeaderChanged.
Anything else we need to know?
Another side effect of this issue is that prometheus interceptor will only increment Unknown grpc code in metrics on error.
I suspect this might also affect behavior of retry_interceptor and client/v3/leasing that retries on transient error.
Etcd version (please run commands below)
etcd 3.4, 3.5 and main
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response