aws-powertools / powertools-lambda

MIT No Attribution
73 stars 4 forks source link

Lambda Powertools Idempotency Support for SQS Batch Processor #32

Closed kimberlyamandalu closed 3 years ago

kimberlyamandalu commented 3 years ago

We use the Powertools SQS Batch Processor utility with our Lambda function. The documentation suggests implementing idempotency into the function logic. We were trying to leverage the Powertools Idempotent feature. There seems to be some limitations with this approach as the Idempotency feature only checks against the entire SQS batch of records instead of each individual record in the batch. So kind of defeats the purpose of the SQS Batch Processor.

Is there any way we can utilize powertools idempotency utility in our lambda with the SQS batch processor utility having the idempotency utility check at an individual record rather than the full batch.

Reading through the source code, it seems like it assumes that it will wrap the lambda handler only so it seems impossible?

Thanks.

heitorlessa commented 3 years ago

Hey @kimberlyamandalu, thank you so much for flagging this.

We've been meaning to create or update Idempotency utility to become a decorator for any standalone function, not just Lambda handler.

Adding this to backlog as we totally should do this and appreciate any help on implementing it

cc @cakepietoast who did the original implementation for ideas.

heitorlessa commented 3 years ago

Just had a look at the source code and it'd indeed need some major internal changes to support any function with any signature. It'd be relatively quick to support any function with a single parameter (like Batch Processor Handler), but if we want to do it right, we should support any function not just the processor handler.

@cakepietoast when you get some bandwidth, could you look into it as you're way more familiar with the implementation?

kimberlyamandalu commented 3 years ago

@heitorlessa Thank you very much for adding this to your backlog. We are looking into the code as well. Will definitely share any findings or suggestions for how to implement. At the moment, what we've noticed is that, the idempotency utility is quite dependent on the Lambda Context object, but it mainly uses it to add the Lambda function name to the primary key value in the DynamoDB table, which makes sense.

@cakepietoast thanks for looking into it!

heitorlessa commented 3 years ago

Yes! That can be easily replaced with fn.name since is a sync function

On Mon, 16 Aug 2021 at 14:32, kimberlyamandalu @.***> wrote:

@heitorlessa https://github.com/heitorlessa Thank you very much for adding this to your backlog. We are looking into the code as well. Will definitely share any findings or suggestions for how to implement. At the moment, what we've noticed is that, the idempotency utility is quite dependent on the Lambda Context object, but it mainly uses it to add the Lambda function name to the primary key value in the DynamoDB table, which makes sense.

@cakepietoast https://github.com/cakepietoast thanks for looking into it!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-lambda-powertools-roadmap/issues/32#issuecomment-899473020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZPQBATI7S6SS3UI6LUGOTT5EAO3ANCNFSM5CCEWOCQ .

cooper3330 commented 3 years ago

Thanks for moving this so quickly to Working on it! One thought on removing the need for Lambda Context object just to access the Lambda Function Name for building the Dynamo Key is to leverage the default Lambda Environment Variable for the Lambda Function Name - AWS_LAMBDA_FUNCTION_NAME

Source: https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-retrieve

heitorlessa commented 3 years ago

As I was getting to know better the idempotency logic I realized if I remove it it'll break people's unit tests. Relying on AWS_LAMBDA_FUNCTION_NAME also makes it inconsistent with how people expect to test things like logger.

I'll create a new decorator instead, though I'm pondering on two solutions and would appreciate your help deciding it:

A. Single argument only: Accept any function with a single argument where the argument is the actual payload we'll hash and optionally extract idempotency tokens (event_jmespath_key config)

B. Function parameter signature: Accept any function with an arbitrary number of parameters, but must have a keyword named argument named payload.

C. <Something else> that might also work on other upcoming languages like Typescript

For example:

Option A - Simple but rigid as you might have multiple parameters

# WORKS
@another_idempotent_name_i_need_to_figure_out
def record_handler(record: Dict): ...

# WILL NOT WORK
@another_idempotent_name_i_need_to_figure_out
def record_handler(record: Dict, my_other_arg: str = ""): ...

# WILL NOT WORK
@another_idempotent_name_i_need_to_figure_out
def record_handler(record: Dict, my_other_arg: str): ...

Option B - Flexible but you must ensure you have kwarg payload

# WORKS
@another_idempotent_name_i_need_to_figure_out
def record_handler(*, payload: dict): ...

# WORKS
@another_idempotent_name_i_need_to_figure_out
def record_handler(*, payload: dict, **kwargs): ...

# MIGHT WORK, DEPENDS ON HOW IT'S CALLED
@another_idempotent_name_i_need_to_figure_out
def record_handler(payload: dict, **kwargs): ...

    # WORKS
    # record_handler(payload={}, foo="bar")

    # WILL NOT WORK
    # record_handler({}, foo="bar")
kimberlyamandalu commented 3 years ago

I personally prefer Option B

# WORKS
@another_idempotent_name_i_need_to_figure_out
def record_handler(*, payload: dict, **kwargs): ...

I think the flexibility this will offer allows the idempotent utility for non lambda handlers to be easily extensible, if needed. This way, we can also have more specific arguments that can be passed to the decorator function (vs having all the input arguments in a single dictionary object as it might get confusing).

Let me know if this makes sense or if I have terribly misunderstood your point.

Thank you!!

heitorlessa commented 3 years ago

Oh I see, you want a third option in this case then @kimberlyamandalu like:

We could do that too, I didn't factor that in because of the JMESPath feature.

This feature allows you to select a portion of the incoming payload and extract what you want as Idempotency tokens.

I'll get back to the whiteboard, write a RFC and then ping it back here - meanwhile, I'll have another look at creative solutions to simply integrate the Batch utility as-is too

Thanks a lot for the help

kimberlyamandalu commented 3 years ago

hey @heitorlessa, Thanks a lot for the feedback.

To clarify, I picked one of the choices you had listed for Option B (the second one that works). Wanted to make sure we were on the same page =)

Currently, the JMESPath feature is included in the Idempotency Config object that is passed to the idempotent function. We do want to have this feature retained. I think it is extremely useful.

# WORKS
@another_idempotent_name_i_need_to_figure_out
def record_handler(*, payload: dict): ...

##### SELECTED THIS OPTION #####
# WORKS
@another_idempotent_name_i_need_to_figure_out
def record_handler(*, payload: dict, **kwargs): ...

# MIGHT WORK, DEPENDS ON HOW IT'S CALLED
@another_idempotent_name_i_need_to_figure_out
def record_handler(payload: dict, **kwargs): ...

    # WORKS
    # record_handler(payload={}, foo="bar")

    # WILL NOT WORK
    # record_handler({}, foo="bar")
heitorlessa commented 3 years ago

Hey @kimberlyamandalu I've updated Slack but updating it here too.

We've finally come to an acceptable design to keep all features. We had to make three compromises for this to work:

  1. A new idempotent_function decorator for any synchronous function with any number of arguments and keyword arguments (a.k.a named arguments)
  2. idempotent_function decorator requires a new parameter: data_keyword_argument.
    • You'll have to tell us which keyword parameter in your function signature has the data we should be performing all operations - hashing for idempotency key, extract data and/or validating parameters using JMESPath, etc.
  3. When calling your function decorated with idempotent_function, you must ensure it uses a keyword argument or else we will raise a RuntimeError for not knowing which data to hash:
    • Works: record_handler(record=data)
    • Does not work: record_handler(data)

For the Batch utility, you don't have to worry about 3 as we are doing this on your behalf. Assuming you want to make sure you only process a SQS record that has a unique message ID once, this is how it'll look like in the next release:

from aws_lambda_powertools.utilities.batch import sqs_batch_processor
from aws_lambda_powertools.utilities.idempotency import idempotent_function, DynamoDBPersistenceLayer, IdempotencyConfig

dynamodb = DynamoDBPersistenceLayer(table_name="idem")
config =  IdempotencyConfig(
    event_key_jmespath="messageId",
    use_local_cache=True,
)

@idempotent_function(data_keyword_argument="record", persistence_layer=dynamodb, config=config)
def record_handler(record):
    return do_something_with(record["body"])

@sqs_batch_processor(record_handler=record_handler)
def lambda_handler(event, context):
    return {"statusCode": 200}

Here's a dummy example to demonstrate the constraint 3.

from aws_lambda_powertools.utilities.idempotency import idempotent_function, DynamoDBPersistenceLayer, IdempotencyConfig
import uuid

records = {
      "messageId": "059f36b4-87a3-44ab-83d2-661975830a7d",
      "receiptHandle": "AQEBwJnKyrHigUMZj6rYigCgxlaS3SLy0a...",
      "body": "Test message.",
      "attributes": {
        "ApproximateReceiveCount": "1",
        "SentTimestamp": "1545082649183",
        "SenderId": "AIDAIENQZJOLO23YVJ4VO",
        "ApproximateFirstReceiveTimestamp": "1545082649185"
      },
      "messageAttributes": {
        "testAttr": {
          "stringValue": "100",
          "binaryValue": "base64Str",
          "dataType": "Number"
        }
      },
      "md5OfBody": "e4e68fb7bd0e697a0ae8f1bb342846b3",
      "eventSource": "aws:sqs",
      "eventSourceARN": "arn:aws:sqs:us-east-2:123456789012:my-queue",
      "awsRegion": "us-east-2"
}

persistence_layer = DynamoDBPersistenceLayer(table_name="idem")
config =  IdempotencyConfig(
    event_key_jmespath="messageId",
    use_local_cache=True,
)

@idempotent_function(persistence_store=persistence_layer, data_keyword_argument="data", config=config)
def handler(any_arg, data, and_kwarg_too=False):
    """Dummy function with an arbitrary number of args and named args"""
    return {"some_id": f"{str(uuid.uuid4())}"}

ret = handler("blah", data=records, and_kwarg_too=True)
print(ret)
cooper3330 commented 3 years ago

Looks perfect - thanks again for picking it up and moving on it so fast! Should be seamless for us to move to this after the next release!

heitorlessa commented 3 years ago

Merged! Coming in today's release