etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.89k stars 9.78k forks source link

Enable client side gRPC health check by default #18882

Open ahrtr opened 1 week ago

ahrtr commented 1 week ago

Followup to https://github.com/etcd-io/etcd/pull/16278

Previously the default gRPC service config for the resolver is {"loadBalancingPolicy": "round_robin"}.

https://github.com/etcd-io/etcd/blob/7ab761246cc449ddfc630001fb2caff160eb4ee3/client/v3/internal/resolver/resolver.go#L44

Now I propose to change the default service config to {"loadBalancingPolicy": "round_robin"}, "healthCheckConfig": {"serviceName": ""}. The benefit is that

k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/etcd-io/etcd/blob/main/OWNERS)~~ [ahrtr] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ahrtr commented 1 week ago

Please anyone feel free to work on this on top of this PR.

codecov-commenter commented 1 week ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 68.73%. Comparing base (7ab7612) to head (fa2078f). Report is 4 commits behind head on main.

:exclamation: Current head fa2078f differs from pull request most recent head a9f846b

Please upload reports for the commit a9f846b to get more accurate results.

Files with missing lines Patch % Lines
server/etcdmain/grpc_proxy.go 0.00% 3 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files | [Files with missing lines](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [client/v3/internal/resolver/resolver.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?src=pr&el=tree&filepath=client%2Fv3%2Finternal%2Fresolver%2Fresolver.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-Y2xpZW50L3YzL2ludGVybmFsL3Jlc29sdmVyL3Jlc29sdmVyLmdv) | `84.00% <100.00%> (ø)` | | | [server/etcdserver/api/v3rpc/interceptor.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?src=pr&el=tree&filepath=server%2Fetcdserver%2Fapi%2Fv3rpc%2Finterceptor.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c2VydmVyL2V0Y2RzZXJ2ZXIvYXBpL3YzcnBjL2ludGVyY2VwdG9yLmdv) | `73.71% <100.00%> (+1.31%)` | :arrow_up: | | [server/etcdmain/grpc\_proxy.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?src=pr&el=tree&filepath=server%2Fetcdmain%2Fgrpc_proxy.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c2VydmVyL2V0Y2RtYWluL2dycGNfcHJveHkuZ28=) | `14.91% <0.00%> (-0.14%)` | :arrow_down: | ... and [20 files with indirect coverage changes](https://app.codecov.io/gh/etcd-io/etcd/pull/18882/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) ```diff @@ Coverage Diff @@ ## main #18882 +/- ## ========================================== + Coverage 68.72% 68.73% +0.01% ========================================== Files 420 420 Lines 35532 35537 +5 ========================================== + Hits 24418 24428 +10 - Misses 9681 9687 +6 + Partials 1433 1422 -11 ``` ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Last update [7ab7612...a9f846b](https://app.codecov.io/gh/etcd-io/etcd/pull/18882?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None).
ahrtr commented 1 week ago

The client side sees a grpc error, not sure why.

$ ./bin/etcdctl get k1
2024/11/12 13:49:23 ERROR: [core] [Channel #1 SubChannel #2]Health check is requested but health check function is not set.

I am pretty sure that we have registered the health service and set the healthpb.HealthCheckResponse_SERVING status. @dfawley @easwars @aranjans can you please share some thought on this? https://github.com/etcd-io/etcd/blob/7ab761246cc449ddfc630001fb2caff160eb4ee3/server/etcdserver/api/v3rpc/grpc.go#L77-L79

https://github.com/etcd-io/etcd/blob/7ab761246cc449ddfc630001fb2caff160eb4ee3/server/etcdserver/api/v3rpc/health.go#L68

chaochn47 commented 1 week ago
$ ./bin/etcdctl get k1
2024/11/12 13:49:23 ERROR: [core] [Channel #1 SubChannel #2]Health check is requested but health check function is not set.

@ahrtr Side effect of importing import _ "google.golang.org/grpc/health" would register the health check function here and mentioned in the feature example

lavacat commented 1 week ago

@ahrtr

The client side sees a grpc error, not sure why.

Maybe you need an import that sets internal.HealthCheckFunc

Any danger to roll this out enabled by default? Do we need a config option? I think, yes. The change is minimal but if there are any bugs in grpc health impl, we might need a way to disable.

ahrtr commented 1 week ago

Thanks both. It's a bad pattern to have a blank-import in a non-main package,

internal/resolver/resolver.go:18:2: blank-imports: a blank import should be only in a main or test package, or have a comment justifying it (revive)
    _ "google.golang.org/grpc/health"
ahrtr commented 1 week ago

/retest

k8s-ci-robot commented 1 week ago

@ahrtr: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-integration-1-cpu-amd64 a9f846b650452ce9621758919f685eae05aad76a link true /test pull-etcd-integration-1-cpu-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
easwars commented 1 week ago

@arjan-bal

ahrtr commented 1 week ago

Any danger to roll this out enabled by default? Do we need a config option?

YES, we definitely need a config option for this; otherwise it will be a breaking change. If the server side health check isn't enabled, the client will will get an error something like below,

2024/11/13 07:39:59 ERROR: [core] [Channel #1 SubChannel #2]Subchannel health check is unimplemented at server side, thus health check is disabled

Please anyone feel free to continue to work on this task on top of this PR,

lavacat commented 1 week ago

Please anyone feel free to continue to work on this task on top of this PR

assigned to myself

ahrtr commented 1 week ago

assigned to myself

thx