awslabs / amazon-kinesis-scaling-utils

The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Apache License 2.0
338 stars 95 forks source link

application is not coming up with config-file-url #96

Open kottaravikumar opened 3 years ago

kottaravikumar commented 3 years ago

Hi Aws team, we are seeing follwoing exceptions in /var/log/messages with environment variable 'config-file-url' - s3://bucket/file-name . Scaling Config: [ {
"streamName":"Ravi-Autoscaling-Testing-Learning", "region":"us-west-2", "scaleOnOperation": ["PUT"], "minShards":1, "maxShards":16, "refreshShardsNumberAfterMin":5, "checkInterval":300, "scaleUp": { "scaleThresholdPct": 70, "scaleAfterMins": 1, "scalePct": 50, "coolOffMins": 15, "notificationARN": "arn:aws:sns:us-west-2:007432231745:kinesis-scaling-utiliity-notifications" }, "scaleDown": { "scaleThresholdPct": 25, "scaleAfterMins": 1, "scalePct": 50, "coolOffMins": 60, "notificationARN": "arn:aws:sns:us-west-2:007432231745:kinesis-scaling-utiliity-notifications" } } ]

Oct 5 07:56:47 ip-172-31-12-175 server: Scale Up Percentage of 50 is invalid or will result in unexpected behaviour. This parameter represents the target size of the stream after being multiplied by the current number of Shards. Scale up by 100% will result in a Stream of the same number of Shards (Current Shard Count 1)07:56:47.792 [localhost-startStop-1] ERROR c.a.s.k.s.auto.AutoscalingController - Fatal Exception while loading configuration file Oct 5 07:56:47 ip-172-31-12-175 server: 07:56:47.794 [localhost-startStop-1] ERROR c.a.s.k.s.auto.AutoscalingController - Scale Up Percentage of 50 is invalid or will result in unexpected behaviour. This parameter represents the target size of the stream after being multiplied by the current number of Shards. Scale up by 100% will result in a Stream of the same number of Shards (Current Shard Count 1) Oct 5 07:56:47 ip-172-31-12-175 server: com.amazonaws.services.kinesis.scaling.auto.InvalidConfigurationException: Scale Up Percentage of 50 is invalid or will result in unexpected behaviour. This parameter represents the target size of the stream after being multiplied by the current number of Shards. Scale up by 100% will result in a Stream of the same number of Shards (Current Shard Count * 1) Oct 5 07:56:47 ip-172-31-12-175 server: at com.amazonaws.services.kinesis.scaling.auto.AutoscalingConfiguration.validate(AutoscalingConfiguration.java:211) Oct 5 07:56:47 ip-172-31-12-175 server: at com.amazonaws.services.kinesis.scaling.auto.AutoscalingConfiguration.loadFromURL(AutoscalingConfiguration.java:195) Oct 5 07:56:47 ip-172-31-12-175 server: at com.amazonaws.services.kinesis.scaling.auto.AutoscalingController.getInstance(AutoscalingController.java:80) Oct 5 07:56:47 ip-172-31-12-175 server: at com.amazonaws.services.kinesis.scaling.auto.app.KinesisAutoscalingBeanstalkApp.contextInitialized(KinesisAutoscalingBeanstalkApp.java:39) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4689) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5155) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:743) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:719) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:705) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1125) Oct 5 07:56:47 ip-172-31-12-175 server: at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1858) Oct 5 07:56:47 ip-172-31-12-175 server: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) Oct 5 07:56:47 ip-172-31-12-175 server: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Oct 5 07:56:47 ip-172-31-12-175 server: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) Oct 5 07:56:47 ip-172-31-12-175 server: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) Oct 5 07:56:47 ip-172-31-12-175 server: at java.base/java.lang.Thread.run(Thread.java:834) Oct 5 07:56:47 ip-172-31-12-175 server: 07:56:47.799 [localhost-startStop-1] INFO c.a.s.k.s.auto.AutoscalingController - Supressing system exit based on environment configuration Oct 5 07:56:48 ip-172-31-12-175 systemd: tomcat.service: main process exited, code=exited, status=255/n/a Oct 5 07:56:48 ip-172-31-12-175 systemd: Unit tomcat.service entered failed state. Oct 5 07:56:48 ip-172-31-12-175 systemd: tomcat.service failed.

IanMeyers commented 3 years ago

Hello - yes this is expected. For Scale up, you need to use an expression larger than 100 - for example to add 50% capacity, use 150.

kottaravikumar commented 3 years ago

Tried with the following scaling configuration.

[ { "streamName":"Ravi-Autoscaling-Testing-Learning", "region":"us-west-2", "minShards":1, "maxShards":16, "scaleOnOperation": ["PUT"], "checkInterval" : 60, "scaleUp": { "scaleThresholdPct": 70, "scaleAfterMins": 3, "scalePct": 200, "coolOffMins": 2 }, "scaleDown":{ "scaleThresholdPct": 30, "scaleAfterMins": 5, "scalePct": 50, "coolOffMins": 2 } }, { "streamName":"Ravi-Autoscaling-Testing-Learning", "region":"us-west-2", "minShards":1, "maxShards":16, "scaleOnOperation": ["GET"], "checkInterval" : 60, "scaleUp": { "scaleThresholdPct": 70, "scaleAfterMins": 3, "scalePct": 200, "coolOffMins": 2 }, "scaleDown":{ "scaleThresholdPct": 30, "scaleAfterMins": 5, "scalePct": 50, "coolOffMins": 2 } } ]

Logs always show utilization is 0.00% though there is some traffic to the kinesis stream(shard count is 1)

Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.375 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used GET[Bytes] Capacity ~ 0.00% (0 Bytes of 2097152) Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.375 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used GET[Records] Capacity ~ 0.00% (0 Records of 2000) Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.375 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric GET[Bytes] due to higher utilisation metric 0.00% Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.440 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [GET] has been below 30% for 5 Minutes Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.440 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.440 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.459 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Bytes] Capacity ~ 0.00% (0 Bytes of 1048576) Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.462 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Records] Capacity ~ 0.00% (0 Records of 1000) Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.462 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric PUT[Bytes] due to higher utilisation metric 0.00% Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.506 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [PUT] has been below 30% for 5 Minutes Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.508 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:02:45 ip-172-31-12-175 server: 12:02:45.508 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.491 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used GET[Bytes] Capacity ~ 0.00% (0 Bytes of 2097152) Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.492 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used GET[Records] Capacity ~ 0.00% (0 Records of 2000) Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.493 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric GET[Bytes] due to higher utilisation metric 0.00% Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.555 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [GET] has been below 30% for 5 Minutes Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.555 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.555 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.570 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Bytes] Capacity ~ 0.00% (0 Bytes of 1048576) Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.574 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Records] Capacity ~ 0.00% (0 Records of 1000) Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.574 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric PUT[Bytes] due to higher utilisation metric 0.00% Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.622 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [PUT] has been below 30% for 5 Minutes Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.622 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:03:45 ip-172-31-12-175 server: 12:03:45.623 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:04:05 ip-172-31-12-175 dhclient[2991]: XMT: Solicit on eth0, interval 112840ms. Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.614 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used GET[Bytes] Capacity ~ 0.00% (0 Bytes of 2097152) Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.615 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used GET[Records] Capacity ~ 0.00% (0 Records of 2000) Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.615 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric GET[Bytes] due to higher utilisation metric 0.00% Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.682 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [GET] has been below 30% for 5 Minutes Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.682 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.682 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.704 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Bytes] Capacity ~ 0.00% (0 Bytes of 1048576) Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.705 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Records] Capacity ~ 0.00% (0 Records of 1000) Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.705 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric PUT[Bytes] due to higher utilisation metric 0.00% Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.753 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [PUT] has been below 30% for 5 Minutes Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.754 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:04:45 ip-172-31-12-175 server: 12:04:45.755 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:05:18 ip-172-31-12-175 amazon-ssm-agent: 2020-10-06 12:05:18 INFO [HealthCheck] HealthCheck reporting agent health. Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.742 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used GET[Bytes] Capacity ~ 0.00% (0 Bytes of 2097152) Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.743 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used GET[Records] Capacity ~ 0.00% (0 Records of 2000) Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.743 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric GET[Bytes] due to higher utilisation metric 0.00% Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.811 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [GET] has been below 30% for 5 Minutes Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.811 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.811 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.877 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Bytes] Capacity ~ 0.00% (0 Bytes of 1048576) Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.880 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Records] Capacity ~ 0.00% (0 Records of 1000) Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.881 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric PUT[Bytes] due to higher utilisation metric 0.00% Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.929 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [PUT] has been below 30% for 5 Minutes Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.929 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:05:45 ip-172-31-12-175 server: 12:05:45.930 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:05:49 ip-172-31-12-175 amazon-ssm-agent: 2020-10-06 12:05:49 INFO [MessagingDeliveryService] [Association] Next association is scheduled at 2020-10-06 19:11:42 +0000 UTC, association will wait for 7h5m52.425468436s Oct 6 12:05:58 ip-172-31-12-175 dhclient[2991]: XMT: Solicit on eth0, interval 121020ms. Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.864 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used GET[Bytes] Capacity ~ 0.00% (0 Bytes of 2097152) Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.864 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used GET[Records] Capacity ~ 0.00% (0 Records of 2000) Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.864 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric GET[Bytes] due to higher utilisation metric 0.00% Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.899 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [GET] has been below 30% for 5 Minutes Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.899 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.900 [pool-2-thread-2] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.991 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Bytes] Capacity ~ 0.00% (0 Bytes of 1048576) Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.991 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Records: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Records] Capacity ~ 0.00% (0 Records of 1000) Oct 6 12:06:45 ip-172-31-12-175 server: 12:06:45.992 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Will decide scaling action based on metric PUT[Bytes] due to higher utilisation metric 0.00% Oct 6 12:06:46 ip-172-31-12-175 server: 12:06:46.044 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Requesting Scale Down of Stream Ravi-Autoscaling-Testing-Learning by 50% as [PUT] has been below 30% for 5 Minutes Oct 6 12:06:46 ip-172-31-12-175 server: 12:06:46.047 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Stream Ravi-Autoscaling-Testing-Learning: Not Scaling Down - Already at Minimum of 1 Shard Oct 6 12:06:46 ip-172-31-12-175 server: 12:06:46.047 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Next Check Cycle in 60 seconds

Can you ( @IanMeyers ) please check if anything I missed out? I have taken cloudwatch metrics screenshot as well but I could not upload here because I don't find any option.

kottaravikumar commented 3 years ago

apart from the scalup issue i just want to add one more observation which might help to debug further. I am using KinesisAutoscaling-.9.8.0.war file but I found that below log message which says it is using 9.6.0 version.

Oct 7 10:23:57 ip-172-31-28-93 server: 10:23:57.705 [pool-2-thread-1] INFO c.a.s.k.scaling.auto.StreamMonitor - Using Stream Scaler Version .9.6.0

kottaravikumar commented 3 years ago

I see the problem is with 9.8.0. metrics pull is working with 9.6.0

Oct 10 13:35:14 ip-172-31-11-28 server: 2020-10-10 13:35:14 INFO StreamMonitor:166 - Bytes: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Bytes] Capacity ~ 1.03% (10,846 Bytes of 1048576) Oct 10 13:35:14 ip-172-31-11-28 server: 2020-10-10 13:35:14 INFO StreamMonitor:166 - Records: Stream Ravi-Autoscaling-Testing-Learning Used PUT[Records] Capacity ~ 60.62% (606 Records of 1000) Oct 10 13:35:14 ip-172-31-11-28 server: 2020-10-10 13:35:14 INFO StreamMonitor:213 - Will decide scaling action based on metric PUT[Records] due to higher utilisation metric 60.62% Oct 10 13:35:14 ip-172-31-11-28 server: 2020-10-10 13:35:14 INFO StreamMonitor:263 - Requesting Scale Up of Stream Ravi-Autoscaling-Testing-Learning by 200% as [PUT] has been above 60% for 1 Minutes Oct 10 13:35:14 ip-172-31-11-28 server: 2020-10-10 13:35:14 INFO StreamScaler:478 - Updating Stream Ravi-Autoscaling-Testing-Learning Shard Count to 2 Oct 10 13:35:14 ip-172-31-11-28 server: 2020-10-10 13:35:14 ERROR StreamMonitor:341 - Failed to process stream Ravi-Autoscaling-Testing-Learning

Are you using UpdateShardCount API in latest version as well? i see the following log message in 9.6.0

Oct 10 13:35:14 ip-172-31-11-28 server: com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts::007432231745:assumed-role/aws-elasticbeanstalk-ec2-role/i-055cc2650d37b6660 is not authorized to perform: kinesis:UpdateShardCount on resource: arn:aws:kinesis:us-west-2:007432231745:stream/Ravi-Autoscaling-Testing-Learning (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException; Request ID: df421cbb-6578-4cf2-88b9-a669eb55068d) fixed the AccessDenied error. i was thinking that this solution bypasses using UpdateShardCount API. please correct me.

IanMeyers commented 3 years ago

Hello,

Yes, this utility uses the UpdateShardCount API first, and then if throttled will fall back to manual scaling shard-by-shard.

Thx,

Ian

IanMeyers commented 3 years ago

Please migrate to version .9.8.1 which provides significantly better logging. These logs should help us to determine if you are getting the right details from the monitoring system to make scaling decisions.