Open aliher1911 opened 1 year ago
cc @cockroachdb/replication
There's a similar issue with memory budget, but processor releases all budget unconditionally on termination without waiting for registrations to drain.
kv.rangefeed.scheduler.enabled
) leaks kv.rangefeed.registrations
(shows 125, while there are only 12 outputLoop
goroutines running).
Rangefeeds maintain a metric
kv.rangefeed.registrations
which shows how many range feeds are active. This metric is a gauge increased when registration is successfully created and must be decreased when registration is removed.In practice, registrations could be terminated by client (when stream is closed from kv client side) or by server (when replica is removed due to rebalancing or split/merge operations). In first case registration will terminate its output loop, which will trigger unregistration request to processor and it will perform a cleanup as a part of its work loop. Processor will then wind down itself if that was the last registration. However, if replica decides to terminate rangefeeds, it will send stop request to processor, which will in turn terminate its registrations. Registrations will update their state and close their output loop, which would trigger unregistration request to processor, but it won't be processed because processor's work loop is already terminated.
Environment:
Additional context Metrics issue makes investigations problematic.
Jira issue: CRDB-29415
Epic CRDB-39959