awslabs / amazon-kinesis-client

Client library for Amazon Kinesis
Apache License 2.0
644 stars 467 forks source link

Correct way to read records from an epoch time #803

Open gogrisohil opened 3 years ago

gogrisohil commented 3 years ago

Hi,

We are trying to run the KCL multilang daemon using the properties file and have it read records from a certain point in time. We tried setting initialPositionInStream = AT_TIMESTAMP but then we got the error java.lang.IllegalArgumentException: Invalid InitialPosition: AT_TIMESTAMP. We then tried to set timestampAtInitialPositionInStream = 1617305352 and not set initialPositionInStream to anything. At that point the lease table for all shards pointed to LATEST instead of AT_TIMESTAMP. We were wondering what we're doing wrong to read records from a certain point.

We're using version 2.3.1 of the KCL.

Thank you.

kevioke commented 3 years ago

I did a little digging in the repository of how AT_TIMESTAMP behavior can potentially be parsed from the multilang daemon. It seems like most of the configuration is read and parsed in this file https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/multilang/config/MultiLangDaemonConfiguration.java

I saw a member variable, InitialPositionInStreamExtended, which seems to give us what we want in terms of setting AT_TIMESTAMP and the date field. I failed to see see how to specify that variable in the properties file, so I'm proposing this change https://github.com/awslabs/amazon-kinesis-client/pull/804 which parses that key in the properties file as a Long. The change works locally and allowed us to initialize a stream reader from a point in time.

I'm not sure if this is the best way to do it, but would love some feedback or direction on if there's a better alternative.