cisagov / admiral

Distributed certificate transparency log harvester
Creative Commons Zero v1.0 Universal
14 stars 3 forks source link

Reconfigure task rate limits #37

Closed king-alexander closed 1 year ago

king-alexander commented 1 year ago

πŸ—£ Description

This pull request turns the summary_by_domain task rate limit into a configuration option.

πŸ’­ Motivation and context

πŸ§ͺ Testing

I tested these changes by configuring the admiral cert-workers to run with 2 different rate limits in place. The requests made to the Certificate Transparency Search API were expected to succeed in one case and fail in the other.

By default, the API permits a maximum of 10 requests per hour for full-domain certificate queries. In order to meet this limit, I had to account for the way in which Celery applies rate limits for tasks. Rate limits are applied on a per-worker basis, so with 5 cert-worker replicas, I adjusted the rate limit to 2 requests per hour. This produced a global rate limit of 10 requests per hour. From there, I verified that requests completed successfully.

For the other case, I changed the rate limit to 1 request per second. As expected, this failed. The rate limit triggered HTTP 429 errors for generating too many requests.

See the screenshots below for the result of each case respectively.

πŸ“· Screenshots

Screen Shot 2023-01-31 at 1 44 12 PM Screen Shot 2023-01-31 at 1 50 24 PM

βœ… Pre-approval checklist

king-alexander commented 1 year ago

I would like to get more experienced eyes on these changes. Is there a better way to do this?

dav3r commented 1 year ago

Don't forget to update the testing section of this PR after you've tested the changes in https://github.com/cisagov/admiral/pull/37/commits/d59c340d0fac65a6c2bd6b0a114d7c3b2eb2e5c3.

king-alexander commented 1 year ago

@king-alexander Did you confirm that varying the rate limit worked as expected? i.e. did you test 1/h vs. 10/h and verify that Celery enforced the correct rate?

I can confirm. As expected, when I changed the rate limit to 1/s I received multiple HTTP 429 Errors. I forgot, however, that the rate limit is applied per worker, which means to produce a global rate limit we must multiply the rate limit put in place by the number of cert-workers we provision. I made 02cf0ce to address this.

dav3r commented 1 year ago

@king-alexander I see you updated the "Testing" section with more detail, but it's not clear to me how the first and third tests are different. They both appear to be 10 requests per hour.

king-alexander commented 1 year ago

@king-alexander I see you updated the "Testing" section with more detail, but it's not clear to me how the first and third tests are different. They both appear to be 10 requests per hour.

Thanks for pointing this out! I'm updating the section to clarify.

I described my testing in terms of how it unfolded rather than the significant results. The first run wasn't important to testing because I had the rate limit wrong. That was 10 requests per hour for each worker, not 10 requests per hour total like it should've been. Anyway, I hope Testing is easier to reason about now.

dav3r commented 1 year ago

@king-alexander I see you updated the "Testing" section with more detail, but it's not clear to me how the first and third tests are different. They both appear to be 10 requests per hour.

Thanks for pointing this out! I'm updating the section to clarify.

I described my testing in terms of how it unfolded rather than the significant results. The first run wasn't important to testing because I had the rate limit wrong. That was 10 requests per hour for each worker, not 10 requests per hour total like it should've been. Anyway, I hope Testing is easier to reason about now.

Thanks for updating your testing narrative- it's much clearer now. πŸ‘