Time interval: task-level option to assign reports to batches based on time of arrival at leader

tgeoghegan commented 1 year ago

In DAP, clients insert a timestamp into reports, which could be the time at which the measurement was taken, or the time at which the upload was performed, or any arbitrary time, really. This timestamp is then used for various decisions like determining whether a report has arrived too late to be processed, or whether a report should be included in an aggregation over some time interval.

We've heard from some DAP users that their existing telemetry systems prefer to make decisions based on the time at which a datum arrives at the server rather than a client-chosen timestamp, as clients clocks can be wrong in ways that subvert data analysis. In particular, those users are concerned about their ability to join across aggregates from DAP (in the client time domain) with aggregates from these other systems (in the server time domain).

For this reason, it'd be interesting if DAP could assign reports to batches based on the time at which they arrived at the leader instead of the client timestamp. That would require DAP changes, because the leader would need to somehow inform the helper of the arrival time of each report in an aggregation job.

Ahead of potential protocol changes, Janus can implement an extension for using leader arrival time and use it in deployments where both aggregators are Janus. In the presence of a special task parameter, we'd extend DAP's struct ReportShare:

struct {
    ReportMetadata report_metadata;
    opaque public_share<0..2^32-1>;
    HpkeCiphertext encrypted_input_share;
    Time leader_arrival_time;
} ReportShare;

When handling an upload, the leader would record the time, and later would insert leader_arrival_time. The helper would then use that value to assign reports to batches.

tgeoghegan commented 1 year ago

It'd be more elegant for the leader to simply re-write the timestamps in the ReportMetadatas it sends to the helper, but this would invalidate the AAD on the encrypted input shares, so we can't do that.

branlwyd commented 1 year ago

I think this idea would fit better into the fixed-size model, as another way for the Leader to choose batches: 1) On arrival, Leader notes the server timestamp of arrival. 2) When generating batches, batching is done via this server timestamp (similar to our current "time-bucketed fixed-size" implementation, but based on the server timestamp instead of the client timestamp).

With this model, the Helper does not need to know about/support the batching strategy in use. (as you note, adapting time-interval to use server timestamps would require either DAP protocol support or nonstandard implementation support from both aggregators) And we get the general advantages of fixed-size over time-interval, primarily being able to support multiple batches for a single time window. And, in this particular case, fixed-size doesn't have its primary disadvantage over time-interval -- namely that the Leader can choose batches arbitrarily -- since the proposed time-interval solution would also allow the Leader to choose batches arbitrarily by making up server timestamps.

divviup / janus

Time interval: task-level option to assign reports to batches based on time of arrival at leader #1692