cloud-gov / cf-cdn-service-broker

A Cloud Foundry service broker for CloudFront and Let's Encrypt
Other
10 stars 13 forks source link

Slow LastOperation calls blocking the cf.cc job queue #92

Open 46bit opened 7 years ago

46bit commented 7 years ago

Hi 18F,

We use a fork of your CDN broker on the GOV.UK PaaS. The broker does a lot of work in its LastOperation endpoint and caused an issue for us.

At present, LastOperation uses the lego/acme library to confirm domains have valid challenges and generate LetsEncrypt certificates. This can take a non-trivial length of time.

We had an incident last week. Asynchronous commands that use the cf CLI (e.g., deleting an organisation) were taking more than 30 seconds. We found that Cloud Foundry's cf.cc job queue has built up a backlog of jobs, because each LastOperation on the CDN broker was taking >10 seconds. There were a few CDN brokers being provisioned, each with a few domains. This is a typical usecase for our tenants.

We think the best resolution is for the broker to perform these operations in the background. The existing database would be used to track the state of these operations and that data would be used to reply quickly to LastOperation.

We're wondering if you've encountered these issues and wherher you've considered how to resolve them. If we implement our proposed fix, would the patch be welcomed?

Cheers, Michael

46bit commented 7 years ago

Our understanding of this issue has changed a little since we posted it. I'll update later today.

mogul commented 7 years ago

(In case it matters, yes, we're open to PRs that help resolve edge-cases or scaling issues like this! Our aesthetic is to try to keep the number of moving parts to a minimum.)