awslabs / amazon-kinesis-scaling-utils

The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Apache License 2.0
338 stars 95 forks source link

Fix bugs on stream metrics retrieval #92

Closed CesarManriqueH closed 4 years ago

CesarManriqueH commented 4 years ago

Description of changes:

While testing the latest version, logs were showing all metrics as zero, like there was not traffic in the target kinesis stream. I switched to version .9.6.0 and confirmed there is a problem with the latest version, because the .9.6.0 works fine.

After debugging the latest version I noticed that the problem with metrics displaying zero as value is caused by datapoint.unit().name() returning uppercased strings.

Then I found errors in StreamMonitor related to conversions from Instant instances to Joda DateTimes.

And finally, I think we should use GetRecords.Records instead of GetRecords.Success, the former provides better information about the number of Get requests.

IanMeyers commented 4 years ago

Can you also increase the version number in the pom.xml, which will allow me to push a build/release with your changes?

CesarManriqueH commented 4 years ago

Hi @IanMeyers! Thanks for your quick reply.

I updated the version in StreamScaler.java to match the one in the pom.xml file. Do you want me to increase both to .9.8.0?

IanMeyers commented 4 years ago

Yes, please increment both so they are in sync as we use this info to generate versioned releases on S3 and in Github.

CesarManriqueH commented 4 years ago

If we are creating a new release then maybe we could include a suggestion I have regarding scaling down behavior.

I think that, when using both metrics (PUT and GET), consumers are penalized by slow producers and viceversa. Let me explain with some examples:

  1. We have kinesis stream with 2 shards (write capacity of 2k records per second). A producer is writing 1500 records per second, then suddenly the consumers go down. The autoscaling will reduce the number of shards which will affect the producer ability to write to the kinesis stream.

  2. We have another kinesis stream. The consumer of that stream has been slow for a while and IteratorAge has raised to 5 hours. A dev discover the problem and fix it. Ideally the GetRecords.Records value would increase and the shard number increase as well. But as long as the producer keep the same pace the kinesis streams won't scale up because it will be held back by the producer.

In my opinion, we can solve this by changing the decision matrix to: image

Which could be implemented this way: https://github.com/CesarManriqueH/amazon-kinesis-scaling-utils/commit/eae9e1bc1e0a7b012d6aa92bc5321f4540ef425a

I would like to get your feedback on this.

CesarManriqueH commented 4 years ago

These are all my changes 😄 . Please let me know if there's something else needed in order to merge this PR.

IanMeyers commented 4 years ago

Can you please just fix typo in README.md from "No Nothing" to "Do Nothing"?

CesarManriqueH commented 4 years ago

Of course, I just updated the README.

sshivananda commented 4 years ago

@IanMeyers - Are there any updates required before merging / publishing these? I hope to use the new changes. I could build the war file locally - but would be interested in using changes from the master branch.

CesarManriqueH commented 4 years ago

@IanMeyers can this be merged? 🙏