confluentinc / confluent-kafka-go

Confluent's Apache Kafka Golang client
Apache License 2.0
4.59k stars 652 forks source link

arise memory leak when create a lot of AdminClient #762

Open jaime0815 opened 2 years ago

jaime0815 commented 2 years ago

Description

I want to find a max connection number of confluent Kafka by creating a lot of AdminClient connections, but it will happen oom kill when the number of the connection reaches 3000 around, it seems space doesn't free that allocated with C.

the following is the system message log

Apr 13 10:05:51 ip-10-12-63-42 kernel: rdk:broker0 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Apr 13 10:05:51 ip-10-12-63-42 kernel: CPU: 1 PID: 12636 Comm: rdk:broker0 Not tainted 5.10.102-99.473.amzn2.x86_64 #1
Apr 13 10:05:51 ip-10-12-63-42 kernel: Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
Apr 13 10:05:51 ip-10-12-63-42 kernel: Call Trace:
Apr 13 10:05:51 ip-10-12-63-42 kernel: dump_stack+0x57/0x70
Apr 13 10:05:51 ip-10-12-63-42 kernel: dump_header+0x4a/0x1f0
Apr 13 10:05:51 ip-10-12-63-42 kernel: oom_kill_process.cold+0xb/0x10

Apr 13 10:05:51 ip-10-12-63-42 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=kafka-test,pid=11593,uid=1000
Apr 13 10:05:51 ip-10-12-63-42 kernel: Out of memory: Killed process 11593 (kafka-test) total-vm:222978252kB, anon-rss:2926928kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:92180kB oom_score_adj:0
Apr 13 10:05:51 ip-10-12-63-42 kernel: oom_reaper: reaped process 11593 (kafka-test), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

How to reproduce

package main

import (
   "fmt"
   "github.com/confluentinc/confluent-kafka-go/kafka"
)

const (
   bootstrapServers = "pkc-oxo5o.ap-southeast-1.aws.confluent.cloud:9092"
   ccloudAPIKey     = "D5GJUT2MC"
   ccloudAPISecret  = "liWujU5/++aaI+0c9jcrpzbxOElaCd4/gUNdR5bc7TaBc5"
)

func newClient() *kafka.AdminClient {
    config := &kafka.ConfigMap{
       "bootstrap.servers":       bootstrapServers,
       "broker.version.fallback": "0.10.0.0",
       "api.version.fallback.ms": 0,
       "sasl.mechanisms":         "PLAIN",
       "security.protocol":       "SASL_SSL",
       "sasl.username":           ccloudAPIKey,
       "sasl.password":           ccloudAPISecret}

   client, err := kafka.NewAdminClient(config)
   if err != nil {
      panic(err)
   }

   return client
}

func testMaxConnection()  {
   clients := make([] *kafka.AdminClient, 10000)
   for i := 0; i < 10000; i++ {
      client := newClient()
      fmt.Println("create client:" , i)
      clients = append(clients, client)
   }

   for _, c := range clients{
      c.Close()
   }
}

func main() {
        testMaxConnection()
}

Checklist

Please provide the following information:

jaime0815 commented 2 years ago

I did not find any hint using pprof tool image

mhowlett commented 2 years ago

you're creating a lot of clients, and getting oom before attempting to close them, so this implies the go client is using a lot of memory, not that the memory isn't garbage collected.

you could try the experiment using more than one machine to instantiate the admin clients.

fwiw, in my own testing i've seen little impact on performance on number of open connections per broker up to about ~7000 connections. with that said, establishing SSL connections is expensive, so you'll probably get timeouts if you try to establish them all at once. the most important thing when optimizing performance is to make sure you're batching messages - in my experimentation, i saw performance roughly proportional to number of broker requests.

edenhill commented 2 years ago

And to add to what @mhowlett says; try reusing existing clients (producers, adminClients) as far as possible.

jaime0815 commented 2 years ago

And to add to what @mhowlett says; try reusing existing clients (producers, adminClients) as far as possible.

@edenhill Try reusing existing clients can resolve this issue, but should not get oom when creating a lot of producer object, they should be shared the connection and some initialize data?

mhowlett commented 2 years ago

no, each client instance is expensive (but powerful), you should generally try to minimize the number of client instances you create.