divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
52 stars 14 forks source link

Provisioning of tasks into the database #44

Closed tgeoghegan closed 1 year ago

tgeoghegan commented 2 years ago

David raised an interesting question in #37, which is whether an aggregator instance would receive the list of task IDs for which it should handle requests and work from configuration or just look up all the tasks defined in the database.

Either way, this begs the question: how do the tasks get written into the database to begin with? And how do the secrets (like HPKE private keys) get into Kubernetes secrets? We will need tools and a process for doing this.

tgeoghegan commented 2 years ago

My preference would be to avoid doing this in something like the old prio-server deploy-tool. Principally, I think we should avoid ever having to expose secrets like this to operator machines. Maybe we could stand up a service that acts as the Divvi Up control plane, and creating a new task would be a matter of making an authenticated API request asking it to make a new task and perform the necessary key generation. Then we could have either humans operators do that, or rig up something in GitHub Actions that would automatically make the task provisioning requests based on a config file checked into janus-ops.

This control plane service could eventually grow into the service that handles customer accounts and self-service task onboarding.

divergentdave commented 2 years ago

An alternate intermediate solution would be to write a standalone binary that provisions a task in the database, but then run it inside the cluster as a Job. We could write a template job manifest file, fill in arguments with parameters like TaskId, and then apply the completed manifest by hand. This would be easy to get up and running, and still generate and handle secrets on the cluster.

branlwyd commented 2 years ago

I think for anything we do that touches real user data should use such a strategy (or, in general, should use a strategy where we generate secrets in the cluster).

Initial interop testing & other deployments that only process fake or otherwise-non-sensitive data can generate secrets anywhere, including on an operator workstation, IMO.

tgeoghegan commented 1 year ago

Another responsibility of this control plane service would be coordinating the other aggregator for a task. Our API for provisioning a task with Janus as the leader would also allow subscribers to configure a helper aggregator, perhaps by specifying an endpoint URL. The helper would then need to support an API for task provisioning so that we could send it the task parameters as well as secret values like the VDAF verify key and an authentication token.

tgeoghegan commented 1 year ago

The ideas here have been refined into the proposal in #1486. I've filed issues in the production readiness milestone tracking specific implementation details. This issue doesn't track anything actionable and contains outdated discussion, so I'm closing it.