etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.89k stars 9.78k forks source link

fix runtime error: comparing uncomparable type #18893

Open ktalg opened 1 week ago

ktalg commented 1 week ago

Fixes a runtime panic that occurs when KeepAlive is called with a Context implemented by an uncomparable type, which is later canceled. The panic message is:

panic: runtime error: comparing uncomparable type lease_test.uncomparableCtx

goroutine 380 [running]:
go.etcd.io/etcd/client/v3.(*lessor).keepAliveCtxCloser(0xc000327720, {0x13416c8, 0xc00011b9b0}, 0x14c4932ade455905, 0xc002291420)
    /home/kevin/IdeaProjects/etcd/client/v3/lease.go:351 +0x1d6
created by go.etcd.io/etcd/client/v3.(*lessor).KeepAlive in goroutine 156
    /home/kevin/IdeaProjects/etcd/client/v3/lease.go:299 +0x554

To reproduce the issue, the existing test has been updated to include a custom uncomparable Context implementation. Previously, this test consistently caused a panic due to the uncomparable Context type. The updated implementation modifies keepAliveCtxCloser to properly handle such cases.

k8s-ci-robot commented 1 week ago

Hi @ktalg. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ktalg Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/etcd-io/etcd/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ktalg commented 6 days ago

Could you kindly review this? @serathius

ktalg commented 3 days ago

Just a friendly reminder to take a look at this PR. Let me know if there’s anything I need to adjust. @ahrtr @jmhbnz @ivanvc @fuweid

ahrtr commented 2 days ago

Thanks for raising this PR. The root cause is that the etcd client SDK tries to compare two context intances, which might not be comparable.

https://github.com/etcd-io/etcd/blob/41862836f9bdf35e7c3ab06b31510e846aa76ee7/client/v3/lease.go#L351

Is this your a real use case in your production code? The easiest workaround is to pass in a pointer of your customized context intance as mentioned in https://github.com/etcd-io/etcd/pull/18893#discussion_r1848648057.

We can attach an unique ID for each context passed to KeepAlive something like below, and compare the IDs when we need to identify the context,

const uniqueContextKey = "uniqueContextKey"
func withUniqueID(ctx context.Context, id string) context.Context {
    return context.WithValue(ctx, uniqueContextKey, id)
}

func getUniqueID(ctx context.Context) (string, bool) {
    id, ok := ctx.Value(uniqueContextKey).(string)
    return id, ok
}

We also need to review the source code to check if there are other places which compare the context as well.

ahrtr commented 2 days ago

Or we can add a function something like below,

func CompareContexts(ctx1, ctx2 context.Context, key string) bool {
    return ctx1.Value(key) == ctx2.Value(key)
}

I am not sure whether there is an existing linter which can detect direct context comparison like below case. Probably we can add a such linter and integrate into golangci-lint if possible? @ivanvc @mmorel-35 @ldez

https://github.com/etcd-io/etcd/blob/41862836f9bdf35e7c3ab06b31510e846aa76ee7/client/v3/lease.go#L351

ldez commented 2 days ago

Hello,

I am not sure whether there is an existing linter which can detect direct context comparison like below case.

There is no linter on this specific case.

Probably we can add a such linter and integrate into golangci-lint if possible?

The contexts are just structs so they are technically comparable.

https://go.dev/play/p/dyfw2rkHfTi

I understand your current issue, I need to investigate to kown if there are "valid" use cases behind this kind of comparison.

ahrtr commented 2 days ago

There is no linter on this specific case.

Thanks for the confirmation.

The contexts are just structs so they are technically comparable.

Not really.

Firstly the applications (including etcd) interact with with interface context.Context https://github.com/golang/go/blob/ed07b321aef7632f956ce991dd10fdd7e1abd827/src/context/context.go#L68

Secondly, struct types are comparable if all their field types are comparable. In other words, if any field isn't comparable, then the struct isn't comparable.

Usually the default implementations of contexts included golang std lib are comparable. But users' customized implementation might not be comparable.

ldez commented 2 days ago

I'm aware of the interface context.Context, I was talking about the standard context. Yes, you can create any kind of implementation, I think it's a bit niche, but anyway.

I still need to investigate the feasibility and the relevance of a linter on this topic.

ahrtr commented 2 days ago

It might not be reasonable to create a linter for this. Comparing structs or interfaces using an comparison operators (i.e. ==) seems not an anti-pattern. It's totally up to applications.

The problem in this case is that users may pass in a customized implementation of context.Context instead of using the existing implementations in golang std lib.

The simplest solution is as I mentioned above in https://github.com/etcd-io/etcd/pull/18893#issuecomment-2486156823 and https://github.com/etcd-io/etcd/pull/18893#issuecomment-2486198060. But it's the first time we see such issue, and most likely it isn't a real production use case. So it's low priority to me.

ldez commented 2 days ago

FYI, I created a quick linter about context comparison to evaluate the relevance.

I analyzed several large projects, and I found no context comparison.

But I found 2 comparisons inside Go itself:

And as you said, struct comparison is not an anti-pattern.

For now, I think it's not worth adding a linter for this.

ahrtr commented 2 days ago

Thanks @ldez

@ktalg Could you please confirm if this is a real use case from your production environment or a hypothetical scenario?

The workaround is to pass in a pointer as mentioned in https://github.com/etcd-io/etcd/pull/18893#discussion_r1848648057

ktalg commented 22 hours ago

@ahrtr

Thank you very much for reviewing this PR and providing detailed feedback!

Could you please confirm if this is a real use case from your production environment or a hypothetical scenario?

I encountered this issue while working on production-grade project code. The problem arose not from creating a new Context implementation but from passing a struct embedding a Context into the KeepAlive function, as illustrated in the test case I added.

This situation highlights two key points:

  1. When a struct embeds an interface type, it automatically becomes an implementation of that interface. https://go.dev/doc/effective_go#embedding
  2. The == operator is not a guaranteed valid operation for interface types; its behavior depends on the specific implementation. https://go.dev/ref/spec#Comparison_operators

I didn't anticipate that KeepAlive would internally use == to compare the type I passed in. My perspective is that robust code should rely solely on the declared contract of an interface, not on implicit assumptions. In this case, it seems to "assume" that the interface implementation is always comparable, which isn't guaranteed.

I believe this change improves the robustness of the code, ensuring it adheres to these principles.

codecov[bot] commented 2 hours ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 68.74%. Comparing base (19aa0db) to head (96cab86). Report is 44 commits behind head on main.

:exclamation: Current head 96cab86 differs from pull request most recent head 5617580

Please upload reports for the commit 5617580 to get more accurate results.

Additional details and impacted files | [Files with missing lines](https://app.codecov.io/gh/etcd-io/etcd/pull/18893?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io) | Coverage Δ | | |---|---|---| | [client/v3/lease.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18893?src=pr&el=tree&filepath=client%2Fv3%2Flease.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io#diff-Y2xpZW50L3YzL2xlYXNlLmdv) | `90.94% <100.00%> (+0.06%)` | :arrow_up: | ... and [19 files with indirect coverage changes](https://app.codecov.io/gh/etcd-io/etcd/pull/18893/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io) ```diff @@ Coverage Diff @@ ## main #18893 +/- ## ========================================== - Coverage 68.81% 68.74% -0.07% ========================================== Files 420 420 Lines 35532 35560 +28 ========================================== - Hits 24450 24445 -5 - Misses 9658 9691 +33 Partials 1424 1424 ``` ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/etcd-io/etcd/pull/18893?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/etcd-io/etcd/pull/18893?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io). Last update [19aa0db...5617580](https://app.codecov.io/gh/etcd-io/etcd/pull/18893?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io).

🚨 Try these New Features:

ahrtr commented 2 hours ago

Link to https://github.com/etcd-io/etcd/issues/18935

Please also squash the commits. Also do you have time to backport this to 3.5 and 3.4?

ahrtr commented 9 minutes ago

/ok-to-test