Closed walmsles closed 1 year ago
Hi @walmsles thank you for the request. Feature parity with the Python version is definitely something that we have on our radars. Middy-compliant middlewares are only one of the usages that we want to cover at this stage (with the others being Class method decorators & manual instrumentation) so the feature deserves a closer look. Please give us some time to discuss this internally.
Hi everyone, here is a design proposal. Would appreciate your comments on this (especially around the Utility interface as it's different from Python and Java which have no constraint on decorator usage.)
The goal of this document is to propose the scope and design of Idempotency utility for Powertools for TypeScript. The utility has been implemented in the Python and Java version. We will use the current Python implementation (https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/) as a baseline, and describe only the differences we will make in TypeScript. Anything not discussed here will be the same as in Python version.
This RFC assumes that you are familiar with Python’s implementation. If you aren’t, please check the documentation of Python Idempotency utility (https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/) first.
Idempotency is a common pattern used by many customers. It guarantees that any retry with the same “idempotency key” should not be executed again. This utility aims to provide “out-of-the-box” idempotency on the top of Lambda with minimum code from library user.
With this utility, customers only need to provide add a decorator, Middy middleware, or a function wrapper on the top of business logic functions. Customers may customize the utility behavior by providing different config, or provide a custom PersistentLayer to use different persistent storage for storing idempotency key.
There are two usages of using Idempotency utility.
The second option is useful for batch or multi-record processing. Imagine that we receive 10 records in a single request. We want the idempotency at the record level, not at the handler level. We can loop through the records and call the decorated function. We need to specify which
Unlike existing utilities, the "manual" options are complex and expose a lot of implementation. Thus, the first release will have less usage options than those utilities.
import {
makeHandlerIdempotent,
DynamoDBPersistenceLayer,
IdempotencyConfig
} from '@aws-lambda-powertools/idempotency';
import middy from '@middy/core';
const config = new IdempotencyConfig({...});
const ddbPersistenceLayer = new DynamoDBPersistenceLayer({...});
const lambdaHandler = async (_event: any, _context: any): Promise<void> => {
/* ...Function logic here... */
}
export const handler = middy(lambdaHandler)
.use(makeHandlerIdempotent({
config: idempotencyConfig,
persistenceLayer: ddbPersistenceLayer,
});
import {
idempotentHandler,
DynamoDBPersistenceLayer,
IdempotencyConfig
} from '@aws-lambda-powertools/idempotency';
const config = new IdempotencyConfig({...});
const ddbPersistenceLayer = new DynamoDBPersistenceLayer({...});
class Lambda implements LambdaInterface {
// Decorate your handler class method
@idempotentHandler(config, ddbPersistenceLayer)
public async handler(_event: any, _context: any): Promise<void> {
/* ...Function logic here... */
}
}
export const handlerClass = new Lambda();
export const handler = handlerClass.handler.bind(handlerClass);
Note that this feature is typically used with Batch Utility (https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/#idempotent_function-decorator). Given that we don’t have Batch Utility yet, we will give an example with a simple for
loop. But be mindful that the more resilient way is to handle failed records, which is out of scope for this utility.
import {
idempotentFunction,
DynamoDBPersistenceLayer,
IdempotencyConfig
} from '@aws-lambda-powertools/idempotency';
import middy from '@middy/core'
const config = new IdempotencyConfig({...});
const ddbPersistenceLayer = new DynamoDBPersistenceLayer({...});
class Lambda implements LambdaInterface {
public async handler(_event: any, _context: any): Promise<void> {
const records = /*..Extract SQS/DDB stream record, etc..*/
const results = []
for(record of records) {
results.push(this.process(record));
}
/* ...Format and return result... */
}
@idempotentFunction({
dataKeywordArgument = 'record', // Match with param name of decorated function below
config,
ddbPersistenceLayer
})
private process(record: any) {
/* ...Function logic here... */
return result;
}
}
export const handlerClass = new Lambda();
export const handler = handlerClass.handler.bind(handlerClass);
Majority of TypeScript/JS Lambda code are not using a class. Middy middleware is not an option as we are dealing with a non-handler function. To support this major use case, we provide a wrapper function
import {
makeFunctionIdempotent,
DynamoDBPersistenceLayer,
IdempotencyConfig
} from '@aws-lambda-powertools/idempotency';
const config = new IdempotencyConfig({...});
const ddbPersistenceLayer = new DynamoDBPersistenceLayer({...});
/**
* Function to process a single record
*/
function processRecord(record: any) {
/* ...Function logic here... */
return result;
}
/**
* Higher-order function to process a single record
*/
const processIdempotently = makeFunctionIdempotent(
processRecord,
{
dataKeywordArgument: 'record',
/*... other options...*/
}
);
const lambdaHandler = async (_event: any, _context: any): Promise<void> => {
const records = /*..Extract SQS/DDB stream record, etc..*/
const results = [];
for (const record of records) {
results.push(processIdempotently(record));
}
/* ...Format and return result... */
}
//...
The first release will contain only high/medium priority features. All "Out of scope" features won’t be implement in the future releases unless there is a clear signal from customers.
Feature | Priority | Description | Note |
---|---|---|---|
FR1 | High | Provide idempotency at handler level (via Middy middleware and decorator), including edge cases | |
FR2 | High | Provide idempotency at function level (via wrapper function and decorator), including edge cases | |
FR3 | High | Return the same result when called with the same payload | |
FR4 | High | Can customize time window | expires_after_seconds in Python |
FR5 | High | Can parse data via JMESPath. We will use external library for JMES parsing for MVP | event_key_jmespath=powertools_json(body) . in Python he alternative solution is to have config to pass a function to extract data. But this will be inconsistent with other Powertools. We can reuse jmespath.js |
FR6 | Low | Payload validation for the case that two requests with the same idempotency key may contain different payload. See doc for details. | payload_validation_jmespath in Python |
FR7 | Low | Local (inside Lambda) caching with LRU cache | use_local_cache and local_cache_max_items in Python |
FR8 | Medium | Passing configuration to AWS SDK V3 | |
FR9 | Medium | Support a composite primary key (for reusing the same table by multiple functions) | sort_key_attr and static_pk_value in Python (for DDB persistence layer) |
FR10 | Medium | Throw an exception if the idempotency key isn't found | raise_on_no_idempotency_key in Python |
FR11 | Out of scope | Filter sensitive field in the result from being cache | Customers can do this by customizing the persistent layer |
NFR1 | High | Allow customization of persistence layer through extending a base class or an interface | |
NFR2 | Low | Switching hash function | hash_function in Python |
NFR3 | High | Handle error thrown by the wrapped function. The idempotence record in persistent should be cleared. | See more discussion on this on the "Edge cases" section below. |
NFR4 | High | Handle Lambda timeout | https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/#lambda-timeouts |
NFR5 | Low | Idempotency metrics (e.g. error by type, cache hit, etc.) | |
NFR6 | Out of scope | Support recovery point for Lambda function perform more than one step and fails midway . The next request continues from the recovery point instead of starting from the beginning (and cause the same side effect twice) | We will assume that the handler has a single side effect. There is no left state that needs to rollback when it fails. |
Info Note: feature with high/medium priority won't be included in the first release (MVP). |
Idempotent function contains many edge cases and limitation. We will use the same behavior as Python and Java version. This section clarifies the behaviours of non-happy flows for reference during implementation.
What if the function has multiple side effects? The first request may be partially-executed, and has already caused a side effect. Howerever, it fails to complete and throw an exception? We won’t support multiple side effects. This is the same with Python version (check the first note in Python Idempotency Request Flows (https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/#idempotency-request-flow) )
If customers choose to do this (not recommended), the responsibility belongs to the Lambda handler. It needs to properly catch the exception, and clean the side effects previously done. Then it can rethrow an exception to idempotent utility to delete the idempotency record.
An alternative option is to make our utility support recovery point (NFR6). Let’s say that the handler function has two steps and both of them have side effects. We can put a recovery point between two steps. When the first step completes, we update the idempotency record. If the 2nd step fails and get retried, the function can skip the first step. However, this complicates the design and interface of our utility. In addition, the system also may be left with a consistent state if retry requests never reaches the Lambda function. The better design is probably split two side effect to two Lambda functions and coordinate them with a Step Function or have SQS queue in between.
IdempotencyItemAlreadyExists
Thrown when there is already an in-progress and unexpired idempotency record. This can happen if the 2nd request comes before the first one has completed and updated the record with a result.IdempotencyItemNotFound
Thrown when it cannot get the record from persistence layer. This shouldn’t happen in normal circumstance.IdempotencyInvalidStatus
Thrown when the record’s status value is wrong. This indicates a bug in persistence layer implementationIdempotencyInconsistentState
Thrown when the record status is incorrect. (e.g. the in-progress record is somehow expired after its initial save). The utility catches this exception at the top level and retry once before rethrowing the exceptionIdempotencyValidationError
Thrown when the payload validation (FR6) is enabled, and validation fails.IdemptoencyKeyNotFound
Thrown when an idempotency key not fund in the payload, and it’s configured to throw an exception in thsi case (FR10)This is a list of important decision points from maintainer discussion. Subject to changes from comments from community.
80-90% of the implementation will be based on the Python version. However, we will deviate where it’s appropriate as we don’t have a constraint of making breaking changes yet.
That sounds very nice and is indeed much needed 👍
We're also currently simply using a middy middleware to check if a request was already executed and add a flag to the context
object but things like
Would definitely help a lot. Thanks!
This looks great! We are very excited that this utility has been prioritized!
After reading the design proposal, we do have a question and a couple of comments.
First, the question: We noticed that FR3 is "Return the same result when called with the same payload", which made us wonder about the scope of FR1. Is it just "don't re-process given the same payload"?
The comments:
First, the question: We noticed that FR3 is "Return the same result when called with the same payload", which made us wonder about the scope of FR1. Is it just "don't re-process given the same payload"?
FR1 will stop Lambda process the same workload twice. FR3 is putting the result in the persistent layer, and return the same result when the new request comes in.
This is just to break implementation into smaller parts. In FR1 implementation, it may simply return a hardcoded response saying that the given idempotency key has been processed. Then, we implement FR3 to save the result from 1st request, and do return the stored result for subsequent requests.
We suggest switching the order of FR1 and FR2. The function wrapper seems like the most extensible part of this library, and there is a lot of value in that alone. This would immediately provide value for event-driven systems, and we think it would be the easiest thing to implement first.
Totally agree. I actually didn't think about implementation order in my mind. Feel free to work in the order you prefer. But please focus on only high/medium priority first.
May be also good to share with us your plan here.
Related to the comment in 6.3 about deviation from Python, we recommend that you not make the lambda context a required configuration. While we understand that this is a library ultimately intended for lambdas, the function-level idempotency feature is very extensible beyond that as long as lambda context remains optional. If lambda context remains optional in the function-level idempotency feature, then this requirement can potentially be broken into even smaller MVPs by limiting the number of edge cases that need to be considered.
Please let me discuss with other maintainers. I will get back to you on this.
@jeffrey-baker-vg This issue is all yours and the team :)
I don't see any comments that require a major change. I think we can start implementation. FR2 is a good candidate to start.
Suggestions (optional) : Given that this feature is quite big and we have many contributors, should we could start with classes and their interfaces? It doesn't have to be detailed, just class names and public methods are sufficient.
The aim is to align every contributor on boundary and responsibility of each classes.
@ijemmy @jeffrey-baker-vg if that helps on 6.3, in Python, we chose not to make lambda context required for two reasons (idempotent_function):
We have customers using it Fargate and on AWS Glue jobs where context isn't available. Hyrum's Law that caught us by surprise.
We were waiting for a native improvement in Lambda Runtime to handle timeout without customer explicit intervention but that wasn't prioritised in the roadmap. We found a transparent solution using a no-op Lambda extension without Layers but it became "too much magic" to hide from customers as a side effect.
I agree with @ijemmy that it is error prone for Lambda customers decorating sync functions but we couldn't find a better trade-off yet --- please let us know if you do in the future.
I wanted to see what other's thoughts were here for using the dataKeywordArgument
variable in the idempotency wrapper:
/**
* Higher-order function to process a single record
*/
const processIdempotently = makeFunctionIdempotent(
processRecord,
{
dataKeywordArgument: 'record',
/*... other options...*/
}
);
const lambdaHandler = async (_event: any, _context: any): Promise<void> => {
const records = /*..Extract SQS/DDB stream record, etc..*/
const results = [];
for (const record of records) {
results.push(processIdempotently(record));
}
/* ...Format and return result... */
}
This seemed to be an adaptation of what was provided with the python version since it could use kwargs and therefore find specific arguments based on names. In javascript, keyword arguments don't really exist. There are mechanisms that allow for psuedo keyword arguments and I wanted to suggest we go the route below so that we can keep the concept of defining the payload we want to use for idempotency while fitting into the mold of javascript/typescript.
/**
* Higher-order function to process a single record
*/
const processIdempotently = makeFunctionIdempotent(
processRecord,
{
dataKeywordArgument: 'field',
/*... other options...*/
}
);
const lambdaHandler = async (_event: any, _context: any): Promise<void> => {
const records = {field: 'value'}
const results = [];
for (const record of records) {
results.push(processIdempotently(record));
}
/* ...Format and return result... */
}
The difference is subtle but the key is that we have to enforce/ensure that the parameter into the method is in fact an object and the dataKeywordArgument is a field in that object; this way we can ensure the ability to look up specific keys within the object. So processRecord
(the function being wrapped) could be written such as:
function processRecord({ arg1 = 1, arg2 = 2, arg3 = 3 } = {}) { return { arg1, arg2, arg3 }; }
And then the user can define the dataKeywordArgument
of arg1/arg2/arg3 when wrapping the function.
We incur the need for standardization a bit here but it does allow for the flexibility of using those specific payloads we want to use in the idempotency logic and keying.
Or we can assume that all arguments are the key to idempotency and not allow for this granular of a specification.
Edit (by @ijemmy): Add code syntax highlight for TypeScript for ease of reading. No content changes.
@KevenFuentes9
Just talked with other maintainers (@dreamorosi , @flochaz ). Named parameter is definitely not what we desire. (We also got confused when looking at Python style). The trade-off is that we will enforce the signature of processRecord()
method to have only one parameter which is an object. Clients need to construct a JSON object when using this feature.
@flochaz proposed another option below. Let's discuss in this issue so we all can voice our opinions and find the most appropriate trade-off.
@dreamorosi @flochaz
It was difficult to wrap the discussion without examples. So let me sum up the 2 options we were discussing.
Firstly, we assume that a client wants to make this method idempotent:
interface Record {
[key: string]: any; // Note that we ONLY accept an object here. Client cannot passes a string or multiple arguments
}
function processRecord(record: Record) {
// do something
}
Here are the two options:
Option 1: Based @KevenFuentes9 's proposal, I expand it to cover the JMESPath.
/**
* Higher-order function to process a single record
*/
const processIdempotently = makeFunctionIdempotent(
processRecord,
{
// There must a this field in the object passed into `processIdempotently` function
dataKeywordArgument: 'fieldToExtractIdempotency',
// (Optional) Used when the `dataKeywordArgument` contains an object and we want to extract key from a subset of fields
eventKeyJMESPath: "[userDetail, productId]"
/*... other options...*/
}
);
const lambdaHandler = async (_event: any, _context: any): Promise<void> => {
const records = [
{
id: '1',
fieldToExtractIdempotency: {
userDetails: 'foo1',
productId: 'bar1'
otherFields: 'fizz1'
}
},
{
id: '2',
fieldToExtractIdempotency: {
userDetails: 'foo2',
productId: 'bar2',
otherFields: 'fizz2',
}
},
]
const results = [];
for (const record of records) {
// Note: the function will throw an error at run time if the records does not contain `field`
const result = processIdempotently(record);
results.push(result);
}
/* ...Format and return result... */
}
Option 2: Let client specify how to extract idempotency key
/**
* Higher-order function to process a single record
*/
const processIdempotently = makeFunctionIdempotent(
processRecord,
{
// Note: Client can specify how to create idempotency key from the passed record
extractKeyFunction: (record) => {
const { userDetails, productId } = record.fieldToExtractIdempotency;
return hash(userDetails + '#' + productId)
}
}
);
// Note: the rest here is the same as the option above...
@flochaz Could you confirm if I understand your option 2 correctly?
@KevenFuentes9
I've discussed with @dreamorosi and @flochaz. Let's go with your proposal. So far, it's the most appropriate one we can find for JavaScript/TypeScript. And it's more compatible with the JMESPath option that we'll implement later.
Very interested in this as well.
If that is of any help, we implemented a custom very basic version of this a while ago (inspired by the python implementation)
Here is a gist https://gist.github.com/bboure/ffaa1d528c49b7dd5eb529c148f89c0f
@bboure Thank you!
Is the IdempotencyEntity
class opensourced somewhere?
Closing this issue since the Idempotency utility was released as beta preview in v1.11.1.
We look forward to hear what you think of it, if you have any comment, question, or bug report please don't hesitate to open a new issue, start a discussion, or join us on Discord!
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Description of the feature request
Problem statement Idempotency is a core Cloud issue that needs a solution to enable stable, fault-tolerant systems that can be affected by repeated transactions. A true idempotency solution as a utility for Powertools would be really useful for this project based on the function/features of the existing AWS Lambda Powertools for python
Summary of the feature Link to Python Lambda Powertools documentation as a good example to follow: https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency
Benefits for you and the wider AWS community Supply a good idempotent solution "Out of the Box" with Powertools for Typescript developers.
Describe alternatives you've considered Looked at this: https://www.npmjs.com/package/middy-idempotent but relies on redis for storage and the Idempotent Key generation is not ideal and looks like it will be problematic.
Additional context https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/idempotency/
Related issues, RFCs
None.