brexhq / substation

Substation is a toolkit for routing, normalizing, and enriching security event and audit logs.
https://substation.readme.io
MIT License
330 stars 21 forks source link

Automated Failure Recovery for Asynchronous Consumers #57

Closed jshlbrd closed 7 months ago

jshlbrd commented 1 year ago

Is your feature request related to a problem? Please describe.

From the 2023 roadmap, AWS deployments that rely on asynchronous Lambda triggers (e.g., AWS S3) are at risk of data loss due to retry limitations (two retry attempts are made before data is lost). We can fix this by adding support for failure destinations and automating recovery by adding additional retry attempts.

Describe the solution you'd like

Add support for the following:

This may require some code changes, but most changes should be made via infrastructure as code.

Describe alternatives you've considered

We recommend that users use Kinesis as intermediary storage for all data pipelines -- Kinesis will retry until data expires in the stream (24 hours by default). An example of this design is here.

Additional context N/A

jshlbrd commented 11 months ago

This will be closed by https://github.com/brexhq/substation/tree/jshlbrd/v1/send-lambda. A working example of failure recovery is available here.