aws / aws-lambda-dotnet

Libraries, samples and tools to help .NET Core developers develop AWS Lambda functions.
Apache License 2.0
1.58k stars 477 forks source link

DynamoDb record deserialization does not work with empty Map values #1888

Open danny-zegel-zocdoc opened 4 days ago

danny-zegel-zocdoc commented 4 days ago

Describe the bug

When using DefaultLambdaJsonSerializer to deserialize DynamoDb events that have nested Map type attributes that are empty it parses them AttributeValue objects that don't have any value set.

var serializer = new Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer();
var dynamoEvent = serializer.Deserialize<Amazon.DynamoDBv2.Model.Record>(memoryStream);

for example the following payload

{
    "awsRegion": "us-east-1",
    "eventID": "9b997d0a-9512-46b7-a239-093431b36204",
    "eventName": "MODIFY",
    "userIdentity": null,
    "recordFormat": "application/json",
    "dynamodb": {
        "ApproximateCreationDateTime": 1732710435144,
        "Keys": {
            "PK": {
                "S": "my pk value"
            },
            "SK": {
                "S": "my sk value"
            }
        },
        "NewImage": {
            "MyData": {
                "M": {
                    "Selections": {
                        "M": {}
                    }
                }
            },
            "SK": {
                "S": "my sk value"
            },
            "PK": {
                "S": "my pk value"
            }
        },
        "OldImage": { },
        "SizeBytes": 30
    },
    "eventSource": "aws:dynamodb"
}

deserialized Selections to an AttributeValue object with nothing set

Screenshot 2024-11-27 at 4 24 43 PM

Regression Issue

Expected Behavior

I would expect DefaultLambdaJsonSerializer to Deserialize empty Maps to AttributeValue objects with IsMSet set to true.

Current Behavior

DefaultLambdaJsonSerializer Deserializes empty Maps to AttributeValue objects with IsMSet set to false.

Reproduction Steps

var json = """
           {
               "awsRegion": "us-east-1",
               "eventID": "9b997d0a-9512-46b7-a239-093431b36204",
               "eventName": "MODIFY",
               "userIdentity": null,
               "recordFormat": "application/json",
               "dynamodb": {
                   "ApproximateCreationDateTime": 1732710435144,
                   "Keys": {
                       "PK": {
                           "S": "my pk value"
                       },
                       "SK": {
                           "S": "my sk value"
                       }
                   },
                   "NewImage": {
                       "MyData": {
                           "M": {
                               "Selections": {
                                   "M": {}
                               }
                           }
                       },
                       "SK": {
                           "S": "my sk value"
                       },
                       "PK": {
                           "S": "my pk value"
                       }
                   },
                   "OldImage": { },
                   "SizeBytes": 30
               },
               "eventSource": "aws:dynamodb"
           }
           """;
var serializer = new Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer();
var dynamoEvent = serializer.Deserialize<Amazon.DynamoDBv2.Model.Record>(new MemoryStream(Encoding.UTF8.GetBytes(json)));
dynamoEvent.Dynamodb.NewImage["MyData"].M["Selections"]!.IsMSet.Should().BeTrue();

Possible Solution

No response

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

Amazon.Lambda.Serialization.SystemTextJson Version 2.4.4 AWSSDK.DynamoDBv2 Version 3.7.301.10

Targeted .NET Platform

.NET 8

Operating System and version

macOS Somona

ashishdhingra commented 4 days ago

@danny-zegel-zocdoc Good afternoon. I used the below code for reproducing the issue:

public void FunctionHandlerNew(DynamoDBEvent input, ILambdaContext context)
{
    context.Logger.LogLine("OK");
}

with the below event payload:

{
  "Records": [
    {
               "awsRegion": "us-east-1",
               "eventID": "9b997d0a-9512-46b7-a239-093431b36204",
               "eventName": "MODIFY",
               "userIdentity": null,
               "recordFormat": "application/json",
               "dynamodb": {
                   "ApproximateCreationDateTime": 1732710435144,
                   "Keys": {
                       "PK": {
                           "S": "my pk value"
                       },
                       "SK": {
                           "S": "my sk value"
                       }
                   },
                   "NewImage": {
                       "MyData": {
                           "M": {
                               "Selections": {
                                   "M": {}
                               }
                           }
                       },
                       "SK": {
                           "S": "my sk value"
                       },
                       "PK": {
                           "S": "my pk value"
                       }
                   },
                   "OldImage": { },
                   "SizeBytes": 30
               },
               "eventSource": "aws:dynamodb"
           }
  ]
}

Inspecting the value of input.Records[0].Dynamodb.NewImage["MyData"].M["Selections"] shows the following in Immediate window:

{Amazon.Lambda.DynamoDBEvents.DynamoDBEvent.AttributeValue}
    B: null
    BOOL: null
    BS: null
    L: null
    M: Count = 0
    N: null
    NS: null
    NULL: null
    S: null
    SS: null

The dependency on SDK was removed in https://github.com/aws/aws-lambda-dotnet/commit/43297758310a170ba1aa9006f7982bca03a8d372 where it now uses it's own Amazon.Lambda.DynamoDBEvents.DynamoDBEvent.AttributeValue POCO class (similar to other event packages in this repository).

I notice that you are using Amazon.DynamoDBv2.Model.Record from the AWS SDK for .NET instead. This leverages AWS .NET SDK auto-generated Amazon.DynamoDBv2.Model.Record model class and accessing AttributeValue.IsMSet would use Amazon.Util.Internal.InternalSDKUtils.GetIsSet(), which returns true only when the Dictionary is not empty.

Are you expecting IsMSet to return true only because collection M is initialized to 0 elements? If yes, do you expect M to be set as null since it had 0 elements in input JSON. I tried setting AWSConfigs.InitializeCollections = false; before execution, it still initialized collection to empty collection. This initialization is handled by JsonSerializer.

I'm unsure if we should be returning IsMSet value as true in case on empty collection (even in upcoming v4 version of AWS .NET SDK). CC @normj for inputs.

Thanks, Ashish

danny-zegel-zocdoc commented 4 days ago

Thanks you for the response @ashishdhingra

Being that DynamoDb does support empty Map values (unlike string set as DDB does not support empty sets) I would expect the SDK to respect it as well and differentiate between nulls and empty dictionaries.

In regards to the AttributeValue implementation, it does contain the IsMSet property which is backed by InternalSDKUtils.GetIsSet and InternalSDKUtils.SetIsSet which utilizes the AlwaysSendDictionary type to differentiate between empty Map value and unset Map values.

Seems to me there are a couple easy options for the fix:

  1. Define a JsonConverter for AttributeValue which would invoke IsMSet = true when deserializing the M typed AttributeValue object when the json has { "M": { } }, and add the converter to DefaultLambdaJsonSerializer.
  2. Update the setter method for AttributeValue.M to either call SetIsSet or perhaps just wrap the dictionary it receives in AlwaysSendDictionary.

just my 2 cents but the first option i described is likely less risky but the second one would make this work as expected out of the box for arbitrary serializers (say someone wanted to use Newtonsoft or whatever).

This also seems to be a problem for AttributeValue.L, AttributeValue.BS & AttributeValue.NS

danny-zegel-zocdoc commented 4 days ago

@ashishdhingra regarding your point about the dependency changes and using Amazon.Lambda.DynamoDBEvents.DynamoDBEvent.AttributeValue, i see that if I use that type it does deserialize properly and in fact Amazon.DynamoDBv2.Model.Record seems equivalent to Amazon.Lambda.DynamoDBEvents.DynamoDBEvent.DynamodbStreamRecord.

However I'm using the deserialized output with Amazon.DynamoDBv2.DocumentModel.Document.FromAttributeMap() (in conjunction with DynamoDBContext.FromDocument()) which expects Dictionary<string, Amazon.DynamoDBv2.Model.AttributeValue> and not Dictionary<string, Amazon.Lambda.DynamoDBEvents.DynamoDBEvent.AttributeValue> so deserializing to the Amazon.Lambda.DynamoDBEvents types doesn't exactly work for my use-case.

danny-zegel-zocdoc commented 4 days ago

Seems to me there are a couple easy options for the fix:

  1. Define a JsonConverter for AttributeValue to invoke IsMSet = true when deserializing the M types AttributeValue object when the json has { "M": { } }, and add the converter to DefaultLambdaJsonSerializer.

If such a JsonConverter were just made available I could add it to an instance of DefaultLambdaJsonSerializer myself even if it's not built it into DefaultLambdaJsonSerializer by default.