Open branlwyd opened 1 year ago
Thoughts/caveats:
max_batch_size
, so determining a lower bound on the report count doesn't validate we fulfill the requirements. However, the fixed-size query type generates batches to the correct size with no ability for the Collector to control batch size, so this practically doesn't matter much.Given these caveats (especially the first one), a better change might be to drop this check entirely from this section of the specification & instead perform a check once aggregation is complete, just before sending the AggregateShareReq
to the helper -- at that point the Leader will know exactly how many reports are included in the batch & can accurately verify batch-size checks. But this would require a change in DAP.
Tim pointed out that if we don't accept collection requests until enough reports are successfully aggregated, VDAFs with nontrivial aggregation parameters (e.g. Poplar1) will be broken since aggregation jobs aren't created until a collection request is received for such VDAFs.
This causes me to lean more towards moving the Leader's batch-size check to "just before issuing an AggregateShareReq
", which would allow the check to be precise (i.e. the exact report count is known) and not impede Poplar1.
This seems funky & has not been addressed, but the right approach requires more thought than a bug-scrub session allows for.
The requirement to check batch size at time of collection job creation was removed in https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/pull/484. IMO, we should remove this check altogether -- it is inaccurate, and the "real" check is in the collection job driver (which looks to the number of successfully-aggregated reports).
DAP-04 says:
This check is implemented in Janus here: https://github.com/divviup/janus/blob/eab56d09fb4273e1b24244745b5a5fd45c283098/aggregator/src/aggregator.rs#L1812-L1819
report_count
comes from a query-type-specific function. In the time-interval case, this is a count on theclient_reports
table; but this count is incorrect, since it counts reports which have failed or not begun aggregation. In the fixed-size case, this is a count on thereport_aggregations
table; but this count is also incorrect, since it counts reports which have failed aggregation.I suspect we want this count to be: look up all the relevant
batch_aggregations
rows based on the incoming collection identifier, and sum theirreport_count
s. This would give an accurate lower bound on the count of reports which will be included in this collection.