Open seanlane opened 1 year ago
It may not crash, but Unmarshaling and Marshaling a value will cause an overflow for most Unix timestamps due to calling time.UnixNano
, toy example:
package main
import (
"encoding/json"
"fmt"
"time"
)
// SecondsEpochTime serializes a time.Time in JSON as a UNIX epoch time in seconds
type SecondsEpochTime struct {
time.Time
}
const secondsToNanoSecondsFactor = 1000000000
const milliSecondsToNanoSecondsFactor = 1000000
func TestUnmarshalJSON(epoch float64) time.Time {
epochSec := int64(epoch)
epochNano := int64((epoch - float64(epochSec)) * float64(secondsToNanoSecondsFactor))
return time.Unix(epochSec, epochNano)
}
func TestMarshalJSON(t time.Time) ([]byte, error) {
// UnixNano() returns the epoch in nanoseconds
unixTime := float64(t.UnixNano()) / float64(secondsToNanoSecondsFactor)
return json.Marshal(unixTime)
}
func testVal(test float64) time.Time {
convertedTime := TestUnmarshalJSON(test)
convertBack, _ := TestMarshalJSON(convertedTime)
fmt.Printf("%f\t%s\t%s\n", test, convertedTime.String(), convertBack)
return convertedTime
}
func main() {
testVal(1669739327580.0) // Millisecond
testVal(1669739327.0) // Second
}
1669739327580.000000 54881-12-06 15:53:00 +0000 UTC -8914383127.569197
1669739327.000000 2022-11-29 16:28:47 +0000 UTC 1669739327
https://github.com/aws/aws-lambda-dotnet/issues/839#issuecomment-1008821492 claims that this only occurs when the dynamo event is first passed through kinesis. Do I understand that correctly? If so, I'm not sure if this is something that's supportable, and the function should operate on the Kinesis event rather than the Dynamo event
I missed the reply on this issue from last year, but it seems to still be relevant.
...this only occurs when the dynamo event is first passed through kinesis. Do I understand that correctly?
Technically, yes, the pipeline is DynamoDB table -> DynamoDB stream -> Kinesis Data stream -> Firehose (where a transformation Lambda is called) -> Firehose destination
That said, the events passed into the transformation lambda are Kinesis Firehose Events which contain Kinesis Firehose Event Records, which in turn have a Data
field.
When configured as above, the Data
field contains a DynamoDBEventRecord, which contains the DynamoDBStreamRecord that has the field we're concerned with, ApproximateCreationDateTime
.
So Data
doesn't contain something like a KinesisEventRecord / KinesisRecord, which would have an ApproximateArrivalTimestamp
, which is was I think was suggested in the previous comment.
There is the ApproximateArrivalTimestamp
field in the KinesisFirehoseEventRecord, but this would be a timestamp on the event getting accepted by the Firehose stream, as opposed to the time when the change was made in DynamoDB or when it was put into the Kinesis Data stream. The Kinesis documentation suggests that it may not be accurate enough as well (emphasis mine):
Each Amazon Kinesis record includes a value, ApproximateArrivalTimestamp, that is set when a stream successfully receives and stores a record. This is commonly referred to as a server-side time stamp, whereas a client-side time stamp is set when a data producer creates or sends the record to a stream (a data producer is any data source putting data records into a stream, for example with PutRecords). The time stamp has millisecond precision. There are no guarantees about the time stamp accuracy, or that the time stamp is always increasing.
Lastly, the original issue that was referenced (https://github.com/aws/aws-lambda-dotnet/issues/839) appears to have been closed, using the same workaround the we implemented back in 2023: Check if the timestamp is more than 5,000 years in the future, and convert to milliseconds if so:
func firehoseHandler(ctx context.Context, firehoseEvent events.KinesisFirehoseEvent) (
events.KinesisFirehoseResponse, error) {
for i, firehoseRecord := range firehoseEvent.Records {
var ddbRecord events.DynamoDBEventRecord
err := json.Unmarshal(firehoseRecord.Data, &ddbRecord)
if err != nil {
// Do something...
}
timeWritten := getDdbCreationTime(ddbRecord.Change)
// Continue processing...
}
func getDdbCreationTime(e events.DynamoDBStreamRecord) time.Time {
t := e.ApproximateCreationDateTime
// There is a bug in the aws-lambda-go library, where timestamps from DDB events in DynamoDB streams are in seconds,
// but timestamps from DDB events in Kinesis Streams are in milliseconds, but the aws-lambda-go library
// marshals both as seconds. If this is a Kinesis event, the time.Time object should be at least 50,000
// years in the future (give or take a few thousand year) when the input was parsed as a seconds timestamp.
// We can convert it back tho. See https://github.com/aws/aws-lambda-go/issues/478 for more details
if t.Year() > (time.Now().Year() + 5000) { // Just test 5K years into the future, should be sufficient
return time.Unix(int64(t.Unix()/1000), (t.Unix()%1000)*1_000_000)
}
return t.Time
}
I'm not sure if that same fix should be incorporated here into the library, but the above has been working reasonably well for us over the last 18 months.
Confirming that I am experiencing the same error, while using kinesis dynamodb stream with firehose/lambda. There is a ApproximateCreationDateTimePrecision
field which can be used to infer the unit of the field.
{
"dynamodb": {
"ApproximateCreationDateTime": 1731101300058336,
"ApproximateCreationDateTimePrecision": "MICROSECOND",
}
}
This is essentially the same issue as https://github.com/aws/aws-lambda-dotnet/issues/839, but without crashing deserialization, which I'm guessing is due to the use of
float64
that avoids overflowing with the larger value to deserialize. The relevant points of discussion are:It seems that the value of ApproximateCreationDateTime will be in seconds when coming from a DynamoDB Stream, but in milliseconds when coming from a Kinesis Stream:
There appears to be an internal ticket that's being tracked, so I wanted to open an issue here as well for AWS to monitor and hopefully resolve in the near future. Thanks!