brexhq / substation

Substation is a toolkit for routing, normalizing, and enriching security event and audit logs.
https://substation.readme.io
MIT License
330 stars 21 forks source link

Idempotent Producers #59

Closed jshlbrd closed 8 months ago

jshlbrd commented 1 year ago

Is your feature request related to a problem? Please describe.

From the 2023 roadmap, delivery of data from our ITL applications have at-least once guarantees , but some users may need an exactly-once guarantee. Exactly-once guarantees are difficult to achieve but can be possible by utilizing caching in our sinks.

It's worth mentioning that exactly-once guarantees require both consumers and producers to support idempotency, but this issue is focused on producers. None of the serverless cloud services that Substation uses today supports idempotency; for example, the Kinesis service describes idempotency as a task that must be handled by consumers and producers.

Describe the solution you'd like

Add support for the following:

Describe alternatives you've considered

Idempotency is not usually required for metrics collection or log ingestion use cases since duplicate records are unlikely to cause adverse outcomes -- in fact, the vast majority of systems like Substation use at-least once guarantees because they are affordable (from a CPU / cost perspective) and low impact when errors occur. Users should make themselves aware of the risks posed by systems that use at-least once guarantees and plan accordingly. Useful information on guarantees can be found in this blogpost.

Additional context N/A

jshlbrd commented 1 year ago

The AWS Lambda Powertools for Python library has some nice patterns for idempotency that we can borrow from.