aws / aws-lambda-go

Libraries, samples and tools to help Go developers develop AWS Lambda functions.
Apache License 2.0
3.64k stars 555 forks source link

[BUG] ApproximateCreationDateTime from DynamoDBStreamRecord in milliseconds when originating from Kinesis, not seconds. #478

Open seanlane opened 1 year ago

seanlane commented 1 year ago

This is essentially the same issue as https://github.com/aws/aws-lambda-dotnet/issues/839, but without crashing deserialization, which I'm guessing is due to the use of float64 that avoids overflowing with the larger value to deserialize. The relevant points of discussion are:

It seems that the value of ApproximateCreationDateTime will be in seconds when coming from a DynamoDB Stream, but in milliseconds when coming from a Kinesis Stream:

There appears to be an internal ticket that's being tracked, so I wanted to open an issue here as well for AWS to monitor and hopefully resolve in the near future. Thanks!

seanlane commented 1 year ago

It may not crash, but Unmarshaling and Marshaling a value will cause an overflow for most Unix timestamps due to calling time.UnixNano, toy example:

package main

import (
    "encoding/json"
    "fmt"
    "time"
)

// SecondsEpochTime serializes a time.Time in JSON as a UNIX epoch time in seconds
type SecondsEpochTime struct {
    time.Time
}

const secondsToNanoSecondsFactor = 1000000000
const milliSecondsToNanoSecondsFactor = 1000000

func TestUnmarshalJSON(epoch float64) time.Time {
    epochSec := int64(epoch)
    epochNano := int64((epoch - float64(epochSec)) * float64(secondsToNanoSecondsFactor))
    return time.Unix(epochSec, epochNano)
}

func TestMarshalJSON(t time.Time) ([]byte, error) {
    // UnixNano() returns the epoch in nanoseconds
    unixTime := float64(t.UnixNano()) / float64(secondsToNanoSecondsFactor)
    return json.Marshal(unixTime)
}

func testVal(test float64) time.Time {
    convertedTime := TestUnmarshalJSON(test)
    convertBack, _ := TestMarshalJSON(convertedTime)
    fmt.Printf("%f\t%s\t%s\n", test, convertedTime.String(), convertBack)
    return convertedTime
}

func main() {
    testVal(1669739327580.0) // Millisecond
    testVal(1669739327.0)    // Second
}
1669739327580.000000    54881-12-06 15:53:00 +0000 UTC  -8914383127.569197
1669739327.000000   2022-11-29 16:28:47 +0000 UTC   1669739327
bmoffatt commented 1 year ago

https://github.com/aws/aws-lambda-dotnet/issues/839#issuecomment-1008821492 claims that this only occurs when the dynamo event is first passed through kinesis. Do I understand that correctly? If so, I'm not sure if this is something that's supportable, and the function should operate on the Kinesis event rather than the Dynamo event

seanlane commented 4 months ago

I missed the reply on this issue from last year, but it seems to still be relevant.

...this only occurs when the dynamo event is first passed through kinesis. Do I understand that correctly?

Technically, yes, the pipeline is DynamoDB table -> DynamoDB stream -> Kinesis Data stream -> Firehose (where a transformation Lambda is called) -> Firehose destination

That said, the events passed into the transformation lambda are Kinesis Firehose Events which contain Kinesis Firehose Event Records, which in turn have a Data field.

When configured as above, the Data field contains a DynamoDBEventRecord, which contains the DynamoDBStreamRecord that has the field we're concerned with, ApproximateCreationDateTime.

So Data doesn't contain something like a KinesisEventRecord / KinesisRecord, which would have an ApproximateArrivalTimestamp, which is was I think was suggested in the previous comment.

There is the ApproximateArrivalTimestamp field in the KinesisFirehoseEventRecord, but this would be a timestamp on the event getting accepted by the Firehose stream, as opposed to the time when the change was made in DynamoDB or when it was put into the Kinesis Data stream. The Kinesis documentation suggests that it may not be accurate enough as well (emphasis mine):

Each Amazon Kinesis record includes a value, ApproximateArrivalTimestamp, that is set when a stream successfully receives and stores a record. This is commonly referred to as a server-side time stamp, whereas a client-side time stamp is set when a data producer creates or sends the record to a stream (a data producer is any data source putting data records into a stream, for example with PutRecords). The time stamp has millisecond precision. There are no guarantees about the time stamp accuracy, or that the time stamp is always increasing.

Lastly, the original issue that was referenced (https://github.com/aws/aws-lambda-dotnet/issues/839) appears to have been closed, using the same workaround the we implemented back in 2023: Check if the timestamp is more than 5,000 years in the future, and convert to milliseconds if so:

func firehoseHandler(ctx context.Context, firehoseEvent events.KinesisFirehoseEvent) (
    events.KinesisFirehoseResponse, error) {
    for i, firehoseRecord := range firehoseEvent.Records {

        var ddbRecord events.DynamoDBEventRecord
        err := json.Unmarshal(firehoseRecord.Data, &ddbRecord)
        if err != nil {
            // Do something...
        }
        timeWritten := getDdbCreationTime(ddbRecord.Change)
        // Continue processing...
}

func getDdbCreationTime(e events.DynamoDBStreamRecord) time.Time {
    t := e.ApproximateCreationDateTime

    // There is a bug in the aws-lambda-go library, where timestamps from DDB events in DynamoDB streams are in seconds,
    // but timestamps from DDB events in Kinesis Streams are in milliseconds, but the aws-lambda-go library
    // marshals both as seconds. If this is a Kinesis event, the time.Time object should be at least 50,000
    // years in the future (give or take a few thousand year) when the input was parsed as a seconds timestamp.
    // We can convert it back tho. See https://github.com/aws/aws-lambda-go/issues/478 for more details
    if t.Year() > (time.Now().Year() + 5000) { // Just test 5K years into the future, should be sufficient
        return time.Unix(int64(t.Unix()/1000), (t.Unix()%1000)*1_000_000)
    }
    return t.Time
}

I'm not sure if that same fix should be incorporated here into the library, but the above has been working reasonably well for us over the last 18 months.

amalakar commented 2 weeks ago

Confirming that I am experiencing the same error, while using kinesis dynamodb stream with firehose/lambda. There is a ApproximateCreationDateTimePrecision field which can be used to infer the unit of the field.

{

            "dynamodb": {
                "ApproximateCreationDateTime": 1731101300058336,
                "ApproximateCreationDateTimePrecision": "MICROSECOND",

             }
}