to: @airbnb/binaryalert-maintainers
cc:
size: small
Background
A recent change in our Binary exfill service is causing a race condition where the triggering event is firing before the resource is actually available through the API. This causes the URL to 404. Currently, BinaryAlert WILL NOT RETRY downloading, thus dropping the analysis of that file.
The issue is that the binary will eventually show up, typically within a few minutes. Changing the handling to 404 will cause the binary to
Changes
ObjectNotFoundErrors (HTTP 404) will now retry. This is accompanied by logging that explicitly states that it will retry.
Internal server errors encountered will now also retry. Improved verbiage that explicitly states that it will retry.
Raises the error threshold on the downloader to 250. We already have an alarm called aws_cloudwatch_metric_alarm.downloader_sqs_age which detects if messages are super old so I think that will handle the case where the downloader gets "stuck"
Coverage decreased (-0.9%) to 91.254% when pulling 6663d8645e0a15ee048da6082f19d1ad7e87ba0b on dw--errors into 3ad4d89b7bd7978d74e25a28e6c4c833f324d752 on master.
Coverage decreased (-0.9%) to 91.254% when pulling 6663d8645e0a15ee048da6082f19d1ad7e87ba0b on dw--errors into 3ad4d89b7bd7978d74e25a28e6c4c833f324d752 on master.
to: @airbnb/binaryalert-maintainers cc: size: small
Background
A recent change in our Binary exfill service is causing a race condition where the triggering event is firing before the resource is actually available through the API. This causes the URL to 404. Currently, BinaryAlert WILL NOT RETRY downloading, thus dropping the analysis of that file.
The issue is that the binary will eventually show up, typically within a few minutes. Changing the handling to 404 will cause the binary to
Changes
aws_cloudwatch_metric_alarm.downloader_sqs_age
which detects if messages are super old so I think that will handle the case where the downloader gets "stuck"Testing
CI?