amazon-archives / kinesis-storm-spout

Kinesis spout for Storm
Other
106 stars 64 forks source link

OUT OF ORDER Inserts with Kinesis Spout and large batch sizes #20

Closed geota closed 9 years ago

geota commented 9 years ago

Believe we found the root issue. The code issues a getRecords call in batches of a configurable size to Amazon and expects the list of records to be returned in order. For example lets say we are requesting in batches of 1000 records. Amazon does provide the next 1000 sequence numbers but the list that is returned is not necessarily sorted. This causes the out of order inserts. The solution we have in place is to sort the result that Amazon returns.

Test case to prove the behavior: https://gist.github.com/geota/ed47ecdead08ab0cab66

Will send a PR soon

""" Processed 900 with no out of order inserts Requesting records java.lang.RuntimeException: OUT OF ORDER INSERT: last seq 49552833805435751671064410149559486681498920731233222658 is AFTER curent seq: 49552833805435751671064410147127127932433940966597984258 at com.amazonaws.services.kinesis.stormspout.KinesisHelperTest.test(KinesisHelperTest.java:48) """

geota commented 9 years ago

Closing this issue for now. We still get this error, but my unit test was not properly resetting the iterator to the next shard itr. Ran it with the fix logic and did not detect any out of order records.