I see that when there is an HTTP level failure for a specific CT server, there is a sleep/retry behavior implemented. Makes sense, clearly you anticipated spurious failures and are handling them:
I'm entering this issue because I recently starting seeing 429s somewhat often on Digicert CT servers.
What are your thoughts on extending the logic in the retry to look at rate-limiting headers if present, and if present, use that value instead of the hard-coded sleep value? I haven't confirmed yet if the (soft) failures I see actually include a Retry-After header, but am guessing there's a half-decent chance they do
I know that the current solution keeps things simple while also not losing any data- to handle 429 fully would really just be a courtesy to the CT servers advising the client to slow down. One could also say it would make certstream-server more "correct" too, if you're into that sort of thing ;)
Output from certstream-server with a short flurry of 429s from Digicert
16:47:12.204 [error] Unexpected status code 429 fetching url https://nessie2023.ct.digicert.com/log/ct/v1/get-entries?start=129276528&end=129276783! Sleeping for a bit and trying again...
16:47:12.207 [error] Unexpected status code 429 fetching url https://nessie2023.ct.digicert.com/log/ct/v1/get-entries?start=129277552&end=129277807! Sleeping for a bit and trying again...
16:47:12.210 [error] Unexpected status code 429 fetching url https://nessie2023.ct.digicert.com/log/ct/v1/get-entries?start=129277040&end=129277295! Sleeping for a bit and trying again...
16:47:15.670 [error] Unexpected status code 429 fetching url https://nessie2023.ct.digicert.com/log/ct/v1/get-entries?start=129279600&end=129279855! Sleeping for a bit and trying again...
16:47:22.315 [error] Unexpected status code 429 fetching url https://nessie2023.ct.digicert.com/log/ct/v1/get-entries?start=129277552&end=129277807! Sleeping for a bit and trying again...
I see that when there is an HTTP level failure for a specific CT server, there is a sleep/retry behavior implemented. Makes sense, clearly you anticipated spurious failures and are handling them:
https://github.com/CaliDog/certstream-server/blob/41c054704316f9ade21a0cc89db19d51e10469e6/lib/certstream/ct_watcher.ex#L63-L84
I'm entering this issue because I recently starting seeing 429s somewhat often on Digicert CT servers.
What are your thoughts on extending the logic in the retry to look at rate-limiting headers if present, and if present, use that value instead of the hard-coded sleep value? I haven't confirmed yet if the (soft) failures I see actually include a
Retry-After
header, but am guessing there's a half-decent chance they doI know that the current solution keeps things simple while also not losing any data- to handle 429 fully would really just be a courtesy to the CT servers advising the client to slow down. One could also say it would make certstream-server more "correct" too, if you're into that sort of thing ;)
References
Output from
certstream-server
with a short flurry of 429s from Digicert