divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
53 stars 14 forks source link

Helper task aggregation counters: implement batched background updates #3443

Open branlwyd opened 1 week ago

branlwyd commented 1 week ago

Helper task aggregation counters are currently updated in a separate DB transaction from the rest of the aggregation writes, in order to avoid unnecessary write contention. Each update is started in its own background (tokio) task, and will use its own DB transaction.

During a recent incident, we saw this transaction failing due to failure to retrieve a DB connection from the pool. In the event of overload, we could queue up an arbitrary number of these tasks, each awaiting a DB transaction -- there is no limiting factor to the number of background tasks we would spawn.

Instead, we should switch to a model where we have a single background task that receives & occasionally writes updates to the task aggregation counters, similar to how application-level metrics are handled for report uploads. This would cap the number of tasks & DB transactions used to update to 1. And this background task would also be able to batch writes to the counters, as well as handling retries in the case that the counters cannot be written.

inahga commented 4 days ago

similar to how application-level metrics are handled for report uploads

FWIW report upload metrics don't run in their own task and are handled as part of report uploading. (In that sense, they're more in-line than even aggregation job handling at the moment). So for this issue we may also want to align align report uploads with this model.

branlwyd commented 4 days ago

Well, report upload metrics are handled in a batched fashion, in a single background task, with a controlled amount of concurrency. You are correct that the report uploads happen in that background task as well -- I'm less confident we should adopt a separate task for report upload metrics just to separate these, though it wouldn't be too hard to implement.

inahga commented 4 days ago

Ah, you're right, I forget that report uploads are batched themselves. Yeah then a separate background task seems less necessary.