cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.61k stars 3.71k forks source link

rangefeed: registrations metric is not always drained on processor termination #106126

Open aliher1911 opened 1 year ago

aliher1911 commented 1 year ago

Rangefeeds maintain a metric kv.rangefeed.registrations which shows how many range feeds are active. This metric is a gauge increased when registration is successfully created and must be decreased when registration is removed.

In practice, registrations could be terminated by client (when stream is closed from kv client side) or by server (when replica is removed due to rebalancing or split/merge operations). In first case registration will terminate its output loop, which will trigger unregistration request to processor and it will perform a cleanup as a part of its work loop. Processor will then wind down itself if that was the last registration. However, if replica decides to terminate rangefeeds, it will send stop request to processor, which will in turn terminate its registrations. Registrations will update their state and close their output loop, which would trigger unregistration request to processor, but it won't be processed because processor's work loop is already terminated.

Environment:

Additional context Metrics issue makes investigations problematic.

Jira issue: CRDB-29415

Epic CRDB-39959

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/replication

aliher1911 commented 1 year ago

There's a similar issue with memory budget, but processor releases all budget unconditionally on termination without waiting for registrations to drain.

erikgrinaker commented 9 months ago

110959 did not appear to fully fix this, since restarting rangefeeds on 23.2 (e.g. by flipping kv.rangefeed.scheduler.enabled) leaks kv.rangefeed.registrations (shows 125, while there are only 12 outputLoop goroutines running).