cisagov / admiral

Distributed certificate transparency log harvester

Creative Commons Zero v1.0 Universal

14 stars 3 forks source link

HTTP 429: Too Many Requests #16

Closed king-alexander closed 2 years ago

king-alexander commented 2 years ago

🐛 Summary

The load_certs.py script crashes after receiving multiple HTTP 429 errors.

To reproduce

Steps to reproduce the behavior:

Execute ./load_certs.py. The script will crash after processing approximately 70 domains.

Any helpful log output or screenshots

king-alexander commented 2 years ago

Editing to add what I think is causing the issue: req.raise_for_status() raises an exception for the HTTP 429 error. The cert_by_id task is configured to auto-retry on any HTTP error until the maximum retry limit is reached. After this maximum is exceeded, any new call to req.raise_for_status() crashes the script.

I will refactor the cert_by_id task in tasks.py to use a custom retry delay with the Retry-After header value returned from the HTTP 429 error.

king-alexander commented 2 years ago

crt.sh does not send a Retry-After header with the response. Back to square one.

Since crt.sh doesn't like that we send multiple requests at once, I switched the group signature in group_update_domain() to a chain. That seems to be working, but it's probably too slow a solution to be useful.

Next, I want to try using chunks. My thinking is we can break the tasks into pieces, gradually increasing the number of certificates done in a batch, until we reach the maximum crt.sh can manage.

king-alexander commented 2 years ago

Testing the script with a default rate limit of no more than 60 certificate grabs per minute, according to https://groups.google.com/g/crtsh/c/NZJntKrBdmg.