aws-samples / performance-testing-framework-for-apache-kafka

Performance Testing Framework for Apache Kafka
MIT No Attribution
46 stars 8 forks source link

How to decide on the depletion number #4

Open Akash2707 opened 1 year ago

Akash2707 commented 1 year ago

I have a test created with the below configuration:

"test_specification": {
        "parameters": {
          "cluster_throughput_mb_per_sec": [
            256,
            350,
            512,
            1024,
            1200
          ],
          "num_producers": [
            9
          ],
          "consumer_groups": [
            {
              "num_groups": 1,
              "size": 9
            },
            {
              "num_groups": 10,
              "size": 9
            }
          ],
          "client_props": [
            {
              "producer": "acks=all linger.ms=5 batch.size=262114 buffer.memory=2147483648 security.protocol=SSL",
              "consumer": "security.protocol=SSL"
            }
          ],
          "num_partitions": [
            250
          ],
          "record_size_byte": [
            1024
          ],
          "replication_factor": [
            3
          ],
          "duration_sec": [
            3600
          ]
        },
        "skip_remaining_throughput": {
          "less-than": [
            "sent_div_requested_mb_per_sec",
            0.995
          ]
        },
        "depletion_configuration": {
          "upper_threshold": {
            "mb_per_sec": 1000
          },
          "approximate_timeout_hours": 0.5
        }

and it starts the execution fine and perform few tests but fail after a few performace test where the DeplitCredit Job fails. On checking then "RunCreditDepletion" batch I see all the jobs inside it failing and I could fine the below error in the log group.

2023-03-31T14:54:19.205-07:00Copy+signal_handler:1> echo 'trap triggered by signal SIGTERM' | +signal_handler:1> echo 'trap triggered by signal SIGTERM'
-- | --

Can you provide any insights on the reason for this? Also what can be the maximum number of partition count that I can add here.

My Broker configurations are: m5.4xlarge with total of 18 brokers

sthm commented 1 year ago

Hey -- the depletion number is mainly relevant for small brokers that can experience burst performance. It's intended to exhaust all burst credits before the actual performance tests start. So you want to set the upper_threshold well above and lower_threshold slightly above the expected sustained throughput of the cluster.

For m5.4xlarge instances, only the network comes with burstable performance. With a single consumer, the storage network will be the bottleneck. But with 10 consumers, the network will eventually be the bottleneck when the cluster throughput exceeds 960 MB/s. So if you really wanted to deplete the network credits before a test, you can use something around 1 GB/s as the lower_threshold and 1.5 GB/s as the upper_threshold. But it will likely take hours until the credits are depleted.