aws-powertools / powertools-lambda-python

A developer toolkit to implement Serverless best practices and increase developer velocity.
https://docs.powertools.aws.dev/lambda/python/latest/
MIT No Attribution
2.82k stars 391 forks source link

Idempotent utility should consider json attribute ordering when creating HASH keys #638

Closed walmsles closed 3 years ago

walmsles commented 3 years ago

I was curious about the idempotent implementation and the general property of REST API bodies being un-ordered lists of JSON attributes. A REST API should be resilient to consumers and pragmatic in its approach to reading data supplied by clients calling them.

I created a simple JSON REST API using serverless and Powertools and implemented the Idempotency utility as follows:

import json
import os
from datetime import datetime
from aws_lambda_powertools import Logger, Tracer
from aws_lambda_powertools.utilities.idempotency import DynamoDBPersistenceLayer, idempotent, IdempotencyConfig

logger = Logger()
tracer = Tracer()

table_name = os.environ['TABLE']
persistence_layer = DynamoDBPersistenceLayer(table_name=table_name)
config = IdempotencyConfig(event_key_jmespath="body")

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@idempotent(persistence_store=persistence_layer, config=config)
def handler(event, context):
    body = {
        "message": f"Data Created: {datetime.now()}",
        "input": event['body']
    }

    logger.info('Processing API call - creating the data now!')
    logger.info(event['body'])
    response = {
        "statusCode": 200,
        "body": json.dumps(body)
    }

    return response

I deployed this to AWS and called the API with input as follows:

Test 1

{
    "data": "test message 1",
    "more_data": "more data 1"
}

Test 2 Note JSON attributes are submitted in different order which is OK for a REST API.

{
    "more_data": "more data 1",
    "data": "test message 1"
}

My expectation is that both these payloads will result in the exact same idempotent response since the payloads are essentially the same so far as REST APi conventions go.

The result I actually get is as follows:

Test 1 Result from first call to the API

{
    "message": "Data Created: 2021-08-22 05:14:01.524104",
    "input": "{\n    \"data\": \"test message 1\",\n    \"more_data\": \"more data 1\"\n}"
}

Test 2 Note: This is a different response than the first call to the API - I was expecting the same idempotent response.

{
    "message": "Data Created: 2021-08-22 05:14:42.554149",
    "input": "{\n    \"more_data\": \"more data 1\",\n    \"data\": \"test message 1\"\n}"
}

What were you trying to accomplish? Implement Idempotency over a simple API using the entire API Body as the key for recognising Idempotency.

Expected Behavior

When calling my API with Test 1 payload and test 2 payload I am expecting the exact same idempotent response.

Current Behavior

Instead I am getting 2 different responses since the order of attributes plays a role in the idempotency key Hashing algorithm used by this utility.

Possible Solution

Suggest looking at using the sort_keys=True option for json.dumps when generating the Hash

Steps to Reproduce (for bugs)

  1. Create lambda API using code similar to code in this report.
  2. send payload: {"data":"test message 1","more_data":"more data 1"}
  3. call API again with same inputs but attribute order changed: {"more_data":"more data 1","data":"test message 1"}
  4. Expect same idempotent response from step 4 as in step 3

Environment

heitorlessa commented 3 years ago

Great catch, indeed an oversight. Would you mind sending this fix ;)?

I was planning to include - bring your own serializer - as part of this release but ran out of time

On Sun, 22 Aug 2021 at 07:42, walmsles @.***> wrote:

I was curious about the idempotent implementation and the general property of REST API bodies being un-ordered lists of JSON attributes. A REST API should be resilient to consumers and pragmatic in its approach to reading data supplied by clients calling them.

I created a simple JSON REST API using serverless and Powertools and implemented the Idempotency utility as follows:

import jsonimport osfrom datetime import datetimefrom aws_lambda_powertools import Logger, Tracerfrom aws_lambda_powertools.utilities.idempotency import DynamoDBPersistenceLayer, idempotent, IdempotencyConfig logger = Logger()tracer = Tracer() table_name = os.environ['TABLE']persistence_layer = DynamoDBPersistenceLayer(table_name=table_name)config = IdempotencyConfig(event_key_jmespath="body") @@.**@.(persistence_store=persistence_layer, config=config)def handler(event, context): body = { "message": f"Data Created: {datetime.now()}", "input": event['body'] }

logger.info('Processing API call - creating the data now!')
logger.info(event['body'])
response = {
    "statusCode": 200,
    "body": json.dumps(body)
}

return response

I deployed this to AWS and called the API with input as follows:

Test 1

{ "data": "test message 1", "more_data": "more data 1" }

Test 2 Note JSON attributes are submitted in different order which is OK for a REST API.

{ "more_data": "more data 1", "data": "test message 1" }

My expectation is that both these payloads will result in the exact same idempotent response since the payloads are essentially the same so far as REST APi conventions go.

The result I actually get is as follows:

Test 1 Result from first call to the API

{ "message": "Data Created: 2021-08-22 05:14:01.524104", "input": "{\n \"data\": \"test message 1\",\n \"more_data\": \"more data 1\"\n}" }

Test 2 Note: This is a different response than the first call to the API - I was expecting the same idempotent response.

{ "message": "Data Created: 2021-08-22 05:14:42.554149", "input": "{\n \"more_data\": \"more data 1\",\n \"data\": \"test message 1\"\n}" }

What were you trying to accomplish? Implement Idempotency over a simple API using the entire API Body as the key for recognising Idempotency. Expected Behavior

When calling my API with Test 1 payload and test 2 payload I am expecting the exact same idempotent response. Current Behavior

Instead I am getting 2 different responses since the order of attributes plays a role in the idempotency key Hashing algorithm used by this utility. Possible Solution

Suggest looking at using the sort_keys=True option for json.dumps when generating the Hash Steps to Reproduce (for bugs)

  1. Create lambda API using code similar to code in this report.
  2. send payload: {"data":"test message 1","more_data":"more data 1"}
  3. call API again with same inputs but attribute order changed: {"more_data":"more data 1","data":"test message 1"}
  4. Expect same idempotent response from step 4 as in step 3

Environment

  • Powertools version used: 1.20.0
  • Packaging format (Layers, PyPi):
  • AWS Lambda function runtime: Python 3.8
  • Debugging logs

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-lambda-powertools-python/issues/638, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZPQBCP5QKEPKD6SKJ7GALT6CE3PANCNFSM5CSS5JKQ .

walmsles commented 3 years ago

@heitorlessa Let me have a go - The test for this one are slightly scary but let me take a look :-)

heitorlessa commented 3 years ago

Not to worry, I know it can be intimidating as it's the most complex part of Powertools - Just got a test for that while I wait for my flight ;) Fix coming soon

heitorlessa commented 3 years ago

Waiting for CI checks then releasing it ;)

That's how a test for that looks like

def test_idempotent_data_sorting():
    # Scenario to validate same data in different order hashes to the same idempotency key
    data_one = {"data": "test message 1", "more_data": "more data 1"}
    data_two = {"more_data": "more data 1", "data": "test message 1"}

    # Assertion will happen in MockPersistenceLayer
    persistence_layer = MockPersistenceLayer("test-func#" + hashlib.md5(json.dumps(data_one).encode()).hexdigest())

    # GIVEN
    @idempotent_function(data_keyword_argument="payload", persistence_store=persistence_layer)
    def dummy(payload):
        return {"message": "hello"}

    # WHEN
    dummy(payload=data_two)
walmsles commented 3 years ago

@heitorlessa all the MockPersistenceLayer setups need to include 'sort_keys=True' for all the existing tests as well. I had progressed to making the existing tests fail Will wait for your fix to come through

heitorlessa commented 3 years ago

Fixed it - it was actually needed in conftest too as there were plenty of json.dumps for dummy hash idempotency keys. Turns out I forgot how long I had to walk to find my gate ;D

For the MockPersistenceLayer, I didn't include the sort_keys up there deliberately, so I can pass payload=data_two down there to ensure they're sorted in the implementation

heitorlessa commented 3 years ago

Available in PyPi now as 1.20.1 - Lambda Layers might take up to 5 minutes to complete ;)

Thank you so so much again @walmsles !