divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
54 stars 15 forks source link

Batch misalignment bug #2270

Closed divergentdave closed 1 year ago

divergentdave commented 1 year ago

I noticed some CI test failures, and I was able to reproduce one locally. (The CI logs were unhelpful because we don't set RUST_LOG=info on cargo test, and the failure happened to be during an in-process integration test rather than a containerized one, which logs by default)

Logs
---- janus::janus_in_process_histogram_4_buckets stdout ----
  2023-11-16T17:19:39.746918Z  INFO testcontainers::core::logs: Found message after comparing 22 lines
    at /home/david/.cargo/registry/src/index.crates.io-6f17d22bba15001f/testcontainers-0.15.0/src/core/logs.rs:72 on ThreadId(11)

  2023-11-16T17:19:39.914748Z  INFO janus_aggregator::binaries::aggregator: Running aggregator, aggregator_bound_address: 127.0.0.1:36885
    at aggregator/src/binaries/aggregator.rs:141 on ThreadId(9)

  2023-11-16T17:19:40.059033Z  INFO janus_aggregator::aggregator::aggregation_job_creator: Starting job creation worker, task_id: YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA
    at aggregator/src/aggregator/aggregation_job_creator.rs:193 on ThreadId(4)
    in janus_aggregator::aggregator::aggregation_job_creator::update_tasks

  2023-11-16T17:19:40.066571Z  INFO janus_aggregator::binaries::aggregator: Running aggregator, aggregator_bound_address: 127.0.0.1:37021
    at aggregator/src/binaries/aggregator.rs:141 on ThreadId(3)

  2023-11-16T17:19:40.069259Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/hpke_config", method: GET

  2023-11-16T17:19:40.071421Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/hpke_config", method: GET

  2023-11-16T17:19:40.200284Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:40.326959Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:40.452924Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:40.566016Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:40.692635Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:40.811502Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:40.929997Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(3)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.049753Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.101860Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:41.102156Z ERROR janus_aggregator_core::datastore: error: DB error: db error: ERROR: could not serialize access due to concurrent update
    at aggregator_core/src/datastore.rs:3904 on ThreadId(9)
    in janus_aggregator_core::datastore::put_batch with batch: Batch { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), batch_identifier: Interval { start: Time(1700150400), duration: Duration(28800) }, state: Open, outstanding_aggregation_jobs: 0, client_timestamp_interval: Interval { start: Time(1700150400), duration: Duration(1) } }
    in janus_aggregator::aggregator::handle_aggregate_init with batch_aggregation_shard_count: 32, aggregation_job_id: AggregationJobId(y6WjitlZWzgNoIx8hRPIUg), task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:41.105126Z ERROR janus_aggregator_core::datastore: error: not found in datastore
    at aggregator_core/src/datastore.rs:1763 on ThreadId(7)
    in janus_aggregator_core::datastore::release_aggregation_job with lease: Lease { leased: AcquiredAggregationJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), aggregation_job_id: AggregationJobId(-NwfB10EcZ_xnqEwowRnFw), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }, lease_expiry_time: 2023-11-16T17:19:51, lease_token: LeaseToken(7ewx9oDxk6znUbJ1YsEkng), lease_attempts: 1 }
    in janus_aggregator::binary_utils::job_driver::Job stepper with acquired_job: AcquiredAggregationJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), aggregation_job_id: AggregationJobId(-NwfB10EcZ_xnqEwowRnFw), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }

  2023-11-16T17:19:41.105897Z ERROR janus_aggregator::binary_utils::job_driver: Couldn't step job, error: not found in datastore
    at aggregator/src/binary_utils/job_driver.rs:199 on ThreadId(7)
    in janus_aggregator::binary_utils::job_driver::Job stepper with acquired_job: AcquiredAggregationJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), aggregation_job_id: AggregationJobId(-NwfB10EcZ_xnqEwowRnFw), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }

  2023-11-16T17:19:41.115472Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:41.165331Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.284147Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.403399Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.521914Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.640781Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.760185Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.879415Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:41.998731Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.121150Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.130343Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:42.232152Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.351672Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.471498Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.590611Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.708876Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.828307Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:42.946067Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.063733Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.151129Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:43.182798Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.300721Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.420302Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(9)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.536479Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.654051Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.772044Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:43.891549Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.007273Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.123091Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.169749Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:44.240771Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.359518Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.476760Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.594862Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.713263Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.831002Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:44.949526Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:45.067587Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:45.171550Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:45.182077Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:45.292334Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:45.407990Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:45.526158Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/reports", method: PUT

  2023-11-16T17:19:45.543657Z  INFO janus_aggregator_core: Finished handling request, status: "Created"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: PUT

  2023-11-16T17:19:45.554439Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:46.180187Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:46.188353Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:46.924042Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:47.541983Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:47.991607Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:48.267411Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:48.717190Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(4)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:49.140564Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:49.539732Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:50.190662Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:50.853313Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:51.184941Z ERROR janus_aggregator_core::datastore: error: already in datastore
    at aggregator_core/src/datastore.rs:1804 on ThreadId(5)
    in janus_aggregator_core::datastore::put_aggregation_job with aggregation_job: AggregationJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), aggregation_job_id: AggregationJobId(-NwfB10EcZ_xnqEwowRnFw), batch_id: (), client_timestamp_interval: Interval { start: Time(1700150400), duration: Duration(1) }, state: Finished, step: AggregationJobStep(0), last_request_hash: Some([172, 44, 166, 244, 129, 119, 143, 99, 59, 86, 83, 168, 237, 112, 135, 58, 162, 178, 124, 241, 193, 11, 8, 25, 181, 165, 147, 26, 63, 148, 148, 85]) }
    in janus_aggregator::aggregator::handle_aggregate_init with batch_aggregation_shard_count: 32, aggregation_job_id: AggregationJobId(-NwfB10EcZ_xnqEwowRnFw), task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:51.186896Z  INFO janus_aggregator_core: Finished handling request, status: "OK"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregation_jobs/:aggregation_job_id", method: PUT

  2023-11-16T17:19:51.539102Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(7)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:52.119156Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:52.850883Z  INFO janus_aggregator_core: Finished handling request, status: "Accepted"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/collection_jobs/:collection_job_id", method: POST

  2023-11-16T17:19:53.176597Z ERROR janus_aggregator::aggregator: error: task YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA: batch misalignment (own checksum = ReportIdChecksum([55, 52, 200, 68, 44, 251, 124, 146, 5, 146, 139, 52, 201, 200, 185, 207, 142, 196, 153, 202, 219, 239, 182, 103, 51, 196, 47, 203, 6, 115, 13, 36]), own report count = 47, peer checksum = ReportIdChecksum([73, 58, 169, 242, 154, 216, 202, 102, 118, 229, 97, 37, 93, 200, 97, 188, 42, 175, 175, 27, 162, 200, 117, 173, 29, 116, 22, 87, 82, 220, 255, 131]), peer report count = 46)
    at aggregator/src/aggregator.rs:2786 on ThreadId(5)
    in janus_aggregator::aggregator::handle_aggregate_share with batch_aggregation_shard_count: 32, collector_hpke_config: HpkeConfig { id: HpkeConfigId(233), kem_id: X25519HkdfSha256, kdf_id: HkdfSha256, aead_id: Aes128Gcm, public_key: HpkePublicKey(HVD0Fd2w0eroNalQzjFqU83ST26o2VFcuAXWgaGpzUg) }, task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregate_shares", method: POST

  2023-11-16T17:19:53.176689Z  WARN janus_aggregator::aggregator::http_handlers: Error handling endpoint, error_code: "batch_mismatch", error: BatchMismatch(BatchMismatch { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), own_checksum: ReportIdChecksum([55, 52, 200, 68, 44, 251, 124, 146, 5, 146, 139, 52, 201, 200, 185, 207, 142, 196, 153, 202, 219, 239, 182, 103, 51, 196, 47, 203, 6, 115, 13, 36]), own_report_count: 47, peer_checksum: ReportIdChecksum([73, 58, 169, 242, 154, 216, 202, 102, 118, 229, 97, 37, 93, 200, 97, 188, 42, 175, 175, 27, 162, 200, 117, 173, 29, 116, 22, 87, 82, 220, 255, 131]), peer_report_count: 46 })
    at aggregator/src/aggregator/http_handlers.rs:43 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregate_shares", method: POST

  2023-11-16T17:19:53.176810Z  INFO janus_aggregator_core: Finished handling request, status: "Bad Request"
    at aggregator_core/src/lib.rs:82 on ThreadId(5)
    in janus_aggregator_core::endpoint with route: "/tasks/:task_id/aggregate_shares", method: POST

  2023-11-16T17:19:53.177243Z ERROR janus_aggregator::aggregator: error: HTTP response status 400 Bad Request - Leader and helper disagree on reports aggregated in a batch.
    at aggregator/src/aggregator.rs:3078 on ThreadId(5)
    in janus_aggregator::aggregator::send_request_to_helper with method: POST, route_label: "tasks/:task_id/aggregate_shares", content_type: "application/dap-aggregate-share-req", url: http://127.0.0.1:36885/tasks/YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA/aggregate_shares
    in janus_aggregator::aggregator::collection_job_driver::step_collection_job with lease: Lease { leased: AcquiredCollectionJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), collection_job_id: CollectionId(U7OO9tbwhcYwxYy7mqcHdQ), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }, lease_expiry_time: 2023-11-16T17:20:03, lease_token: LeaseToken(makpgu7jnBl4LcZcQY0AIg), lease_attempts: 1 }
    in janus_aggregator::binary_utils::job_driver::Job stepper with acquired_job: AcquiredCollectionJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), collection_job_id: CollectionId(U7OO9tbwhcYwxYy7mqcHdQ), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }

  2023-11-16T17:19:53.177294Z ERROR janus_aggregator::aggregator::collection_job_driver: error: HTTP response status 400 Bad Request - Leader and helper disagree on reports aggregated in a batch.
    at aggregator/src/aggregator/collection_job_driver.rs:77 on ThreadId(5)
    in janus_aggregator::aggregator::collection_job_driver::step_collection_job with lease: Lease { leased: AcquiredCollectionJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), collection_job_id: CollectionId(U7OO9tbwhcYwxYy7mqcHdQ), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }, lease_expiry_time: 2023-11-16T17:20:03, lease_token: LeaseToken(makpgu7jnBl4LcZcQY0AIg), lease_attempts: 1 }
    in janus_aggregator::binary_utils::job_driver::Job stepper with acquired_job: AcquiredCollectionJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), collection_job_id: CollectionId(U7OO9tbwhcYwxYy7mqcHdQ), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }

  2023-11-16T17:19:53.177318Z ERROR janus_aggregator::binary_utils::job_driver: Couldn't step job, error: Http(HttpErrorResponse { problem_details: HttpApiProblem { type_url: Some("urn:ietf:params:ppm:dap:error:batchMismatch"), status: Some(400), title: Some("Leader and helper disagree on reports aggregated in a batch."), detail: None, instance: None, additional_fields: {"taskid": String("YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA")} }, dap_problem_type: Some(BatchMismatch) })
    at aggregator/src/binary_utils/job_driver.rs:199 on ThreadId(5)
    in janus_aggregator::binary_utils::job_driver::Job stepper with acquired_job: AcquiredCollectionJob { task_id: TaskId(YwgbExQR41vvA1vY8mGTDPJW8jhSES0tLWWiC9yPZdA), collection_job_id: CollectionId(U7OO9tbwhcYwxYy7mqcHdQ), query_type: TimeInterval, vdaf: Prio3Histogram { length: 4, chunk_length: 2 } }
divergentdave commented 1 year ago

I bisected and confirmed that "c28be440a143f4cb60d2cf4f017ddf321119344d is the first bad commit". This was #2235, and #2272 will revert it.