aws / aws-lambda-dotnet

Libraries, samples and tools to help .NET Core developers develop AWS Lambda functions.
Apache License 2.0
1.58k stars 479 forks source link

Lambda support for DynamoDb Kinesis Stream processing #1554

Open danny-zegel-zocdoc opened 1 year ago

danny-zegel-zocdoc commented 1 year ago

Describe the feature

Given a DynamoDb table configured with a Kinesis Stream and a Lambda to process the Kinesis Stream, the Amazon.Lambda.KinesisEvents package provides KinesisEvent as the lambda input object with the following structure

public class KinesisEvent
{
    public IList<KinesisEvent.KinesisEventRecord> Records { get; set; }

    public class Record : Amazon.Kinesis.Model.Record
    {
        public string KinesisSchemaVersion { get; set; }
    }

    public class KinesisEventRecord
    {
        /// A bunch of other properties
        ...

        public KinesisEvent.Record Kinesis { get; set; }
    }
}

public class Record
{
    /// A bunch of other properties
    ...

    public MemoryStream Data { get; set; }
}

In the case of a DynamoDb Kinesis Stream the Record.Data MemoryStream contains a non-encoded Json string which appears to match the type Amazon.DynamoDBv2.Model.Record from the AWSSDK.DynamoDBv2 package. Unfortunately the Json cannot be easily deserialized into a Amazon.DynamoDBv2.Model.Record due to the nested property ApproximateCreationDateTime which is in the form of epoch time in the JSON (i.e. an integer) but is typed as DateTime in Amazon.DynamoDBv2.Model.StreamRecord. I'm also concerned that there might be breaking edge-cases when directly deserializing AttributeValue types which are also nested inside Amazon.DynamoDBv2.Model.StreamRecord.

I seem to be able to deserialize Record.Data successfully using the Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer serializer which is really not intuitive and disappointing to have to do.

I'm wondering if it would be reasonable to either create a KinesisDynamoDbEvent type either in Amazon.Lambda.KinesisEvents or another package which types the nested Data property as a DynamoDb StreamRecord instead of a MemoryStream, or perhaps provide an intuitive utility to deserialize the MemoryStream appropriately.

Use Case

When using Lambda to process a Kinesis Stream that is backed by a DynamoDb table it is difficult to deserialize the Amazon.Kinesis.Model.Record MemoryStream property to a dynamodb record.

Proposed Solution

No response

Other Information

No response

Acknowledgements

AWS .NET SDK and/or Package version used

Amazon.Lambda.KinesisEvents

Targeted .NET Platform

.NET 6 and up

Operating System and version

N/A

danny-zegel-zocdoc commented 1 year ago

Correction: It seems that using DefaultLambdaJsonSerializer to deserialize the Amazon.DynamoDBv2.Model.Record does not work because ApproximateCreationDateTime is being generated with a 13 digit number (millisecond granularity) instead of a 10 digit number (second granularity). I thought this worked when I tested it using localstack which generates 10 digit numbers.

According to the docs here https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Services/DynamoDBv2/Generated/Model/StreamRecord.cs#L45-L52 the value should have second granularity, is this a bug in kinesis?

As a work around for now I copied the DateTimeConverter included in DefaultLambdaJsonSerializer with a slight modification to support millisecond epoch time as well as second epoch time and swapped out my version in place of the default one.

ashishdhingra commented 1 year ago

@danny-zegel-zocdoc Thanks for submitting feature request. Please review if this issue is similar to existing issue https://github.com/aws/aws-lambda-dotnet/issues/839 in our backlog. If yes, kindly close this issue to avoid duplicate issue.

Thanks, Ashish

danny-zegel-zocdoc commented 1 year ago

839

@ashishdhingra the bug aspect of this issue does seem to be the same issue as https://github.com/aws/aws-lambda-dotnet/issues/839 however the bug was not the primary aspect of this request. This request is primarily suggesting that there be a more prescribed/standardized way of parsing Kinesis records that come from DynamoDb being that what you need to do right now is not at all intuitive.

ashishdhingra commented 1 year ago

839

@ashishdhingra the bug aspect of this issue does seem to be the same issue as https://github.com/aws/aws-lambda-dotnet/issues/839 however the bug was not the primary aspect of this request. This request is primarily suggesting that there be a more prescribed/standardized way of parsing Kinesis records that come from DynamoDb being that what you need to do right now is not at all intuitive.

Agreed. But changing the type in any of the nested properties would be a breaking change for existing customer as you mentioned. Creation of separate package needs to be reviewed by the team.

danny-zegel-zocdoc commented 1 year ago

839

@ashishdhingra the bug aspect of this issue does seem to be the same issue as #839 however the bug was not the primary aspect of this request. This request is primarily suggesting that there be a more prescribed/standardized way of parsing Kinesis records that come from DynamoDb being that what you need to do right now is not at all intuitive.

Agreed. But changing the type in any of the nested properties would be a breaking change for existing customer as you mentioned. Creation of separate package needs to be reviewed by the team.

Alternatively a new type could be created in the existing package 🤷.

I'll leave it up to your team to decide if/how this would be done.

Thank you for the consideration.

normj commented 1 year ago

I think adding a KinesisDynamoDbEvent type in the Kinesis event stream package makes sense to make this use case easier. We should not have it reference the types in the SDK. We have learned the hard way that this causes issues because the SDK types are setup for how the SDK does serialization/deserialization which is all code generated and it doesn't fit well with generic JSON serialization/deserialization. As @danny-zegel-zocdoc pointed out dates are problematic as well as how the SDK uses its ConstantClass to mimic enumerations. A while back we created a V2 of the S3 event package to separate the package from the SDK.

@ashishdhingra We should add this feature request to our backlog but to level set @danny-zegel-zocdoc I'm not sure when we will be able to get to this and in the Lambda space getting ready for .NET 8 is our priority. If you are interested in doing a PR I'm happy to take a look at it.