bitnami / containers

Bitnami container images
https://bitnami.com
Other
3.08k stars 4.49k forks source link

On Load Testing, bitnami/kafka:3.7.0 Server Fails to Connect after working fine for so many seconds/ publishes #67986

Closed kapyaar closed 1 week ago

kapyaar commented 1 month ago

Name and Version

bitnami/kafka:3.7.0

What architecture are you using?

amd64

What steps will reproduce the bug?

Trying to get a working system for handling high throughput. Testing on docker windows. The setup works for manual testing, but if I use K6 to do a load test, it stops after publishing around 28000 msgs.

With the following dockercompose file [EDIT: Adding section that creates the topic]

services:
  php:
    build:
      context: .
      dockerfile: Dockerfile
    working_dir: /var/www/html/
    ports:
      - 80:80
      - 8443:443
    volumes:
      - .:/var/www/html/
  kafka:
    image: 'bitnami/kafka:3.7.0'
    ports:
      - 9092:9092
    environment:
      # KRaft settings. No longer using zookeeper method.
      - KAFKA_CFG_NODE_ID=0
      - KAFKA_KRAFT_CLUSTER_ID=NDllYzhlNzNjMmZmNDEyNT
      - KAFKA_CFG_PROCESS_ROLES=controller,broker
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
      - KAFKA_CFG_ADVERTISED_HOST_NAME=kafka
      - KAFKA_AUTO_CREATE_TOPICS_ENABLE:true  
      # Additional env variables tried based on recommendations from different sources
      - KAFKA_CFG_GROUP_INITIAL_REBALANCE_DELAY_MS= 0
      - KAFKA_CFG_PRODUCER_ACKS= 0
      - KAFKA_CFG_PRODUCER_MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION=1
      - KAFKA_CFG_PRODUCER_BATCH_SIZE=106384
      - KAFKA_CFG_PRODUCER_LINGER_MS=0    
      # Listeners
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=PLAINTEXT
  create-topic:
    image: 'bitnami/kafka:3.7.0'
    depends_on:
      - kafka
    command: >
      bash -c "
        sleep 5 &&
        /opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server kafka:9092 --create --topic  testData --if-not-exists --partitions 1 --replication-factor 1 "

And a Dockerfile Using Nginx Unit (FROM unit:1.32.1-php8.2) with opcache, and rdkafka extension enabled

And Producer.php

<?php

        $payload=trim($_POST['payload']);   
        $conf = new RdKafka\Conf();
        $conf->set('metadata.broker.list', 'kafka:9092');
        $producer = new RdKafka\Producer($conf);
        $topic="testData";
        $kafkaTopic = $producer->newTopic($topic);
        $kafkaTopic->produce(0, 0, $payload);
        $producer->poll(0);
        for ($flushRetries = 0; $flushRetries < 10; $flushRetries++) {
                $result = $producer->flush(20000);
                if (RD_KAFKA_RESP_ERR_NO_ERROR === $result) {
                    break;
                }
        }
    ?>

And For K6 load testing, I use the script.js below, generating a string that is some 200 bytes long, with the following setup.

export let options = {
      vus: 30, // 30 virtual users
      duration: '30s', // Run test for 30 seconds
    };
    export default function () {
      // Generate data
      let fakeData = generateFakeDataStringForKafka();

      let res = http.post('http://localhost/producer.php', `payload=${fakeData}`, {
        headers: {
          'Content-Type': 'application/x-www-form-urlencoded',
        },
      });
    }

With this config, I Run

docker-compose up --build

Then, k6 run script.js

After about 28000 publishes, It stops publishing, and I get the following error

phpApp          | %3|1718821570.096|FAIL|rdkafka#producer-2836| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to connect to broker at kafka-3.7.kafka-producer-k6_default:9092: Cannot assign requested address (after 6ms in state CONNECT)
phpApp          | %3|1718821570.096|FAIL|rdkafka#producer-2816| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to connect to broker at kafka-3.7.kafka-producer-k6_default:9092: Cannot assign requested address (after 6ms in state CONNECT, 1 identical error(s) suppressed)

K6 report is shown below.

         data_received..................: 6.0 MB 174 kB/s
         data_sent......................: 11 MB  311 kB/s
         http_req_blocked...............: avg=10.97µs  min=0s      med=0s      max=9.51ms  p(90)=0s      p(95)=0s
         http_req_connecting............: avg=630ns    min=0s      med=0s      max=991.8µs p(90)=0s      p(95)=0s
         http_req_duration..............: avg=33.99ms  min=13.23ms med=21.11ms max=6.02s   p(90)=22.47ms p(95)=23.2ms
           { expected_response:true }...: avg=33.99ms  min=13.23ms med=21.11ms max=6.02s   p(90)=22.47ms p(95)=23.2ms
         http_req_failed................: 0.00%  ✓ 0          ✗ 28301
         http_req_receiving.............: avg=603.81µs min=0s      med=526.2µs max=12.01ms p(90)=1.04ms  p(95)=1.1ms
         http_req_sending...............: avg=17.87µs  min=0s      med=0s      max=1.62ms  p(90)=0s      p(95)=0s
         http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s      p(90)=0s      p(95)=0s
         http_req_waiting...............: avg=33.37ms  min=12.7ms  med=20.53ms max=6.02s   p(90)=21.74ms p(95)=22.37ms
         http_reqs......................: 28301  822.663185/s
         iteration_duration.............: avg=34.31ms  min=16.75ms med=21.38ms max=6.02s   p(90)=22.81ms p(95)=23.67ms
         iterations.....................: 28301  822.663185/s
         vus............................: 10     min=10       max=30
         vus_max........................: 30     min=30       max=30

    running (0m34.4s), 00/30 VUs, 28301 complete and 0 interrupted iterations
    default ✓ [======================================] 30 VUs  30s

What is the expected behavior?

The expected behavior is that the Kafka server handles the producer messages without any error. But it looks like something is getting used up. and not released for a while? I say this because If I wait for a minute or so, and restart the test, it will repeat the same behavior, stopping around 28000 messages published.

What do you see instead?

phpApp          | %3|1718821570.096|FAIL|rdkafka#producer-2836| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to connect to broker at kafka-3.7.kafka-producer-k6_default:9092: Cannot assign requested address (after 6ms in state CONNECT)
phpApp          | %3|1718821570.096|FAIL|rdkafka#producer-2816| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to connect to broker at kafka-3.7.kafka-producer-k6_default:9092: Cannot assign requested address (after 6ms in state CONNECT, 1 identical error(s) suppressed)

Additional information

Wonder if I am missing some configuration steps? I am testing this on Windows docker.

carrodher commented 4 weeks ago

The issue may not be directly related to the Bitnami container image, but rather to how the application is being utilized, configured in your specific environment, or tied to a specific scenario that is not easy to reproduce on our side.

If you think that's not the case and are interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

github-actions[bot] commented 2 weeks ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 1 week ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.