Open 46bit opened 7 years ago
Our understanding of this issue has changed a little since we posted it. I'll update later today.
(In case it matters, yes, we're open to PRs that help resolve edge-cases or scaling issues like this! Our aesthetic is to try to keep the number of moving parts to a minimum.)
Hi 18F,
We use a fork of your CDN broker on the GOV.UK PaaS. The broker does a lot of work in its
LastOperation
endpoint and caused an issue for us.At present,
LastOperation
uses the lego/acme library to confirm domains have valid challenges and generate LetsEncrypt certificates. This can take a non-trivial length of time.We had an incident last week. Asynchronous commands that use the
cf
CLI (e.g., deleting an organisation) were taking more than 30 seconds. We found that Cloud Foundry'scf.cc
job queue has built up a backlog of jobs, because eachLastOperation
on the CDN broker was taking >10 seconds. There were a few CDN brokers being provisioned, each with a few domains. This is a typical usecase for our tenants.We think the best resolution is for the broker to perform these operations in the background. The existing database would be used to track the state of these operations and that data would be used to reply quickly to
LastOperation
.We're wondering if you've encountered these issues and wherher you've considered how to resolve them. If we implement our proposed fix, would the patch be welcomed?
Cheers, Michael