aws-samples / performance-testing-framework-for-apache-kafka

Performance Testing Framework for Apache Kafka
MIT No Attribution
46 stars 8 forks source link

TooManyRequestsException on GetBootstrapBrokers operation #6

Open chn217 opened 1 year ago

chn217 commented 1 year ago

I'm using this repository to run a load test against AWS MSK.

The test cluster is as follows: MSK Cluster: 9 x m5.12xlarge Kafka version: 2.8.0 Authentication: IAM

Here is test spec into step function: ''' { "test_specification": { "parameters": { "cluster_throughput_mb_per_sec": [ 1200, 1600, 2000 ], "consumer_groups": [ { "num_groups": 1, "size": 400 } ], "num_producers": [ 40 ], "client_props": [ { "producer": "acks=all linger.ms=5 batch.size=262114 buffer.memory=2147483648 security.protocol=SASL_SSL", "consumer": "security.protocol=SASL_SSL" } ], "num_partitions": [ 400 ], "record_size_byte": [ 5120 ], "replication_factor": [ 3 ], "duration_sec": [ 360 ] }, "skip_remaining_throughput": { "less-than": [ "sent_div_requested_mb_per_sec", 0.995 ] } } } ''' The AWS step function state machine is stuck in RunPerformanceTest. The CloudWatch dashboard also confirmed that no traffic is generated. There are 440 jobs being scheduled. 152 jobs were stuck in waiting for all the jobs to start (wait_for_all_tasks), and 288 jobs failed because of the following error:

2023-05-29T05:41:15.850Z An error occurred (TooManyRequestsException) when calling the GetBootstrapBrokers operation (reached max retries: 4): Too Many Requests

Looks like the AWS API GetBootstrapBrokers has been rate limited.

Is this kind of test scale supported by this repository?

sthm commented 1 year ago

Hey -- I haven't tested the code with that many consumers. However, it's still odd that the containers die with this error as the logic should retry fetching the bootstrap brokers: https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/blob/main/cdk/docker/run-kafka-command.sh#L149-L154.

Maybe the retry logic does not work properly and an the error fetching the bootstrap brokers stops the entire script? Can you try to remove that line and see if that mitigates the problem: https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/blob/main/cdk/docker/run-kafka-command.sh#L19?