cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.8k forks source link

kvserver: gossip_alerts sometimes uses zero TTL #128793

Open tbg opened 2 months ago

tbg commented 2 months ago

Describe the problem

Customer saw gossip_alerts entries originating from a node that hasn't been around for three years. Entries in this table come from gossip and should have a (short) TTL.

Manual inspection showed two callers that did not specify TTL:

https://github.com/cockroachdb/cockroach/blob/8fef7ac4072882fcb9a4cde37fcaf15527e18519/pkg/server/node.go#L757

and

https://github.com/cockroachdb/cockroach/blob/8fef7ac4072882fcb9a4cde37fcaf15527e18519/pkg/server/node.go#L1033

These should not be using a zero TTL or a zero TTL should disable gossiping of alerts.

Affects all versions incl. master at the time of writing. I'd say this isn't super severe, but annoying to users who do look at that table - which technically should be none of them because it's crdb_internal. But we know everything out there gets used by someone.

Expected behavior

gossip alerts from a node expire after a reasonable amount of time.

Additional data / screenshots

More likely with multiple stores per node or when a node is terminated during start-up sequence and then removed from cluster in absentia.

Environment:

affects all versions, specific version irrelevant

Additional context

Customer was concerned about entries in this table.

Jira issue: CRDB-41210

Epic CRDB-37617

blathers-crl[bot] commented 2 months ago

Hi @tbg, please add a C-ategory label to your issue. Check out the label system docs.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] commented 2 months ago

Hi @tbg, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.