The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Apache License 2.0
338
stars
95
forks
source link
Smoother scaling up by having checkInterval configurable and a cool off period when upscaling #87
Given that checkInterval is hardcoded to 45 seconds and not configurable; and given that the resharding operation can take more than 45 seconds, it normally happens that amazon-kinesis-scaling-utils ends up deciding to scale up several times, when only once was necessary.
In that situation, the number of total shards is greater than it should be, which means the scale down policy will need to decrease the unnecessary shards. And which is even worse, many intermediate shards are created when upscaling and downscaling (in order to evenly distribute the hashes between the final opened shards), which occasionally makes consumers using the KCL library to be unbalanced as most of the opened shards are concentrated in one or a few consumers.
This effect creates several problems:
Increased kinesis costs due to unnecessary shards
Increased dynamodb costs, due to extra traffic to KCL dynamodb tables
Consumers misbehaviour due to assigned opened shards unbalancing
A couple of easy changes to the JSON configuration would fix this problem, they can be used independently or be complementary:
Add checkInterval in the JSON configuration: this also has the benefit of reducing the CloudWatch traffic produced by this application
Add scaleUp.coolOffMins with the same behaviour as the current scaleDown.coolOffMins
With this two changes the user can increase the checkInterval to a value greater than the time it takes to do the resharding while reducing the traffic to CloudWatch. Or, if the user wants to react much faster to a traffic increase, he can set a scaleUp.coolOffMins value to let the system do the resharding.
Obviously, these two parameters are also useful to adapt the way a given system wants to react in front of specific traffic patterns as they provide configuration options closer to those which can be found in EC2.
Given the scale up configuration:
Given that
checkInterval
is hardcoded to 45 seconds and not configurable; and given that the resharding operation can take more than 45 seconds, it normally happens thatamazon-kinesis-scaling-utils
ends up deciding to scale up several times, when only once was necessary.In that situation, the number of total shards is greater than it should be, which means the scale down policy will need to decrease the unnecessary shards. And which is even worse, many intermediate shards are created when upscaling and downscaling (in order to evenly distribute the hashes between the final opened shards), which occasionally makes consumers using the KCL library to be unbalanced as most of the opened shards are concentrated in one or a few consumers.
This effect creates several problems:
A couple of easy changes to the JSON configuration would fix this problem, they can be used independently or be complementary:
checkInterval
in the JSON configuration: this also has the benefit of reducing the CloudWatch traffic produced by this applicationscaleUp.coolOffMins
with the same behaviour as the currentscaleDown.coolOffMins
With this two changes the user can increase the
checkInterval
to a value greater than the time it takes to do the resharding while reducing the traffic to CloudWatch. Or, if the user wants to react much faster to a traffic increase, he can set ascaleUp.coolOffMins
value to let the system do the resharding.Obviously, these two parameters are also useful to adapt the way a given system wants to react in front of specific traffic patterns as they provide configuration options closer to those which can be found in EC2.