Open divergentdave opened 1 month ago
Implementation idea, based on off-issue discussion:
I think we'd implement it as: after "normal" creation of aggregation jobs, we might have a few "straggler" reports left in-hand that aren't numerous enough to permit creation of another aggregation job. Check for existing collection jobs for the time windows associated with these straggler reports; create an aggregation job using the reports whose time windows have a collection job.
Things I'd want to think about more deeply before implementing:
(These two points are in contention with one another.)
Currently, if a time interval task has some number of reports uploaded, and then report uploads stop, it's possible for aggregation and collection of the existing reports to get stuck. (at least until clients upload more reports)
When the report uploads stop, if there are fewer unaggregated reports than
min_aggregation_job_size
, then the aggregation job creator will not create any aggregation jobs. Thus, these reports will remain unaggregated. If a collection job is submitted with an interval that includes any such unaggregated report, the collection job driver will not process the job until all unaggregated reports in the batch interval have been processed (and all outstanding aggregation jobs have been finished or abandoned). Taken together, this means it's possible for a collection job to get stuck, even if we have sufficient valid reports to complete it. Getting into this state depends on race conditions between the clients' uploads and the aggregation job creator. We expect that tasks using the time interval query type will typically be for continuous metrics tasks, so extended periods with zero uploaded reports may be unusual.We could fix this with new heuristics or conditions to allow creating an under-sized aggregation job, though how we do so may impact overhead from more smaller aggregation jobs and write contention during the ensuing aggregation. Thus, we'll want to only create under-sized aggregation jobs in limited situations.