Closed agis closed 5 years ago
Is there anything interesting in dmesg
?
Are the consumer/producers in the same process, or multiple processes?
Is there enough memory?
Not anything interesting in dmesg at the time the the segfault was triggered. However, the OOM killer was triggered about 45minutes later, so this might be relevant.
The consumers/producers are in the same process, but separate goroutines.
How many brokers do you have in the cluster?
@edenhill We have 3 4 brokers.
After looking into /var/log/kern.log
, turns out the OOM killer was invoked also 5 minutes earlier than the segfault, so that might be the culprit.
Maybe irrelevant but regarding the amounts of threads/processes in our service:
$ systemctl show -p TasksMax -p TasksCurrent rafka
TasksCurrent=936
TasksMax=6144
librdkafka will create 1+N+B threads per instance, where N is the number of advertised brokers and B is the number of bootstrap servers not matching the advertised brokers. With the C3 interceptor you'll double the number of client instances and thus the number of threads.
While I don't think you're likely to hit the max thread count (maxprocs), your app might run out of memory due to each thread's stack.
Thanks @edenhill. Closing this as a non-issue.
Apologies for not attaching debug logs but this is the first time we've encountered this SEGFAULT, so I'm putting it here before it occurs again, in case it's a known issue.
Description
We're using confluent-kafka-go 0.11.6. The source code is available here.
We might have 20-30 consumers and 70-90 producers open concurrently. However, our
/proc/sys/kernel/threads-max
is 514353.We observed the following segfault:
Checklist
0.11.6
confluent-kafka-2.11
api.version.request:true bootstrap.servers:foo1-1.example.com:9200,foo1-2.example.com:9200,foo2-1.example.com:9200,foo2-2.example.com:9200 log.connection.close:false plugin.library.paths:monitoring-interceptor session.timeout.ms:10000] map[enable.auto.commit:true enable.auto.offset.store:false auto.offset.reset:latest fetch.message.max.bytes:104857 api.version.request:true plugin.library.paths:monitoring-interceptor go.application.rebalance.enable:false enable.partition.eof:false queued.max.messages.kbytes:10000 auto.commit.interval.ms:2000 log.connection.close:false session.timeout.ms:10000 bootstrap.servers:foo1-1.example.com:9200,foo1-2.example.com:9200,foo2-1.example.com:9200,foo2-2.example.com:9200 queued.min.messages:100000 go.events.channel.enable:false] map[go.events.channel.size:1000 message.send.max.retries:5 request.required.acks:-1 go.produce.channel.size:0 session.timeout.ms:10000 go.delivery.reports:true api.version.request:true bootstrap.servers:foo1-1.example.com:9200,foo1-2.example.com:9200,foo2-1.example.com:9200,foo2-2.example.com:9200 log.connection.close:false plugin.library.paths:monitoring-interceptor
Debian Stretch
debug=..
as necessary) from librdkafka