awslabs / dynamodb-streams-kinesis-adapter

The Amazon DynamoDB Streams Adapter implements the Amazon Kinesis interface so that your application can use KCL to consume and process data from a DynamoDB stream.
Apache License 2.0
97 stars 37 forks source link

DynamoDB Streams lag monitoring #5

Open ivanarkhipov opened 7 years ago

ivanarkhipov commented 7 years ago

Hello! We're using DynamoDB Streams + Kinesis Client Library (KCL). How can we measure latency between event was created in a stream and it was processed on KCL side?

As I know, KCL's MillisBehindLatest metric is specific to Kinesis Streams. approximateCreationDateTime record attribute has a minute-level approximation, which is not acceptable for monitoring in sub-second latency systems. Could you please help with some useful metrics for monitoringDynamoDB Streams latency?

Thank you!

Ivan

pfifer commented 7 years ago

This feature is currently on DynamoDB's road map, but they don't currently have an ETA.

amcp commented 7 years ago

Put the System.timeInMillis() in an item attribute on your own when you put and update items. As long as your stream view type is NEW_IMAGES or OLD_AND_NEW_IMAGES and your item updates contain this timestamp, you can get a better approximation.

joelittlejohn commented 7 years ago

@amcp I'm afraid adding an item attribute does not solve this issue. I think this one should be reopened.

The requirement here is for lag a metric. This means the time (in millis) between the current item, and the latest item that was added to the stream. From the docs for MillisBehindLatest:

The number of milliseconds the GetRecords response is from the tip of the stream, indicating how far behind current time the consumer is. A value of zero indicates record processing is caught up, and there are no new records to process at this moment.

If no new items are added, the client is not lagging (even if the time attribute on the item is old). This is very different to checking a time attribute on the item.

amcp commented 7 years ago

Seems to me you are interested in the age of stream records relative to the tip of each shard. Each processor works on a shard forward in time. Each time you do a GetRecords call on your usual shard iterator, you could also get a shard iterator for that shard of type LATEST and compute the lag you seek in that manner. Note that shards can roll over for size and age or split for throughput reasons so you might have to do a few calls to get to the latest child shard. By sampling the tip of each shard lineage, you could keep a pretty good estimate of how much you lag.

amcp commented 7 years ago

Here is some good related reading (also includes links to prior articles). https://noise.getoto.net/2016/08/19/monitor-your-application-for-processing-dynamodb-streams/

joelittlejohn commented 7 years ago

Another measure of lag is the number of records between the current set of records and the tip. It would be good if this library implemented some help with either kind of lag monitoring.

amcp commented 7 years ago

Together with the lag estimates above you could also use 1 minute CloudWatch ConsumedCapacity metrics on the table to estimate the number of writes accepted per second, allowing you to backtrack the number of records between your Stream Worker and the heads of offspring shard lineages.

amcp commented 7 years ago

Another thing you could do is feed the DynamoDB Stream into a Kinesis stream with a Lambda, and use the MillisBehindLatest metric from Kinesis records. Seems a bit over the top though.

Mentis commented 6 years ago

Any updates on that? Is there any other way to identify how long particular event sits in the stream?

aggarwal commented 4 years ago

The value of ApproximateCreationDateTime is precise to the second as of January 2019.

We're currently working on emitting a MillisBehindLatest metric from the adapter package that will emit the difference between ApproximateCreationDateTime from the GetRecords result and System.currentTimeMillis() on the client. Emitting this metric will allow a large majority of customers to get some basic monitoring out of the box. This will allow you to track how far behind you are in processing your stream. We expect to release this change in the next few weeks.

The DynamoDB Streams GetRecords API does not currently expose any data about the amount or age of records that were written after the records returned in a batch. Making this data available is a large project that requires architectural changes in the service. We'll consider this in our 6-12 month roadmap.

pietropra commented 3 years ago

@aggarwal this is great to know I was just looking at this information. Perhaps is worth making it explicit in the documentation?

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_StreamRecord.html

aggarwal commented 3 years ago

Version 1.5.3 now includes the implementation of MillisBehindLatest as described above. Due to limitations of how the metric object is scoped in KCL, this metric is emitted at the stream-shard level, and not at the application-level.

https://github.com/awslabs/dynamodb-streams-kinesis-adapter/releases/tag/1.5.3

jeet23 commented 2 years ago

Hi @aggarwal, Is the MillisBehindLatest metric available for DynamoDB Streams Kinesis adapter as well?

Or is it only for the Kinesis streams?