We have ulimitnproc set for all users, which disallows creation of threads after a certain limit has reached. In such cases, confluent-kafka seems to hang when creating a new consumer.
How to reproduce
Create a python script, test.py with the following contents:
I'm using kafka.Consumer() from confluent-kafka-python to initialize the consumer (see the stack-trace below indicating the exact C-level method called).
Set the ulimit to a lower value, and run the script:
Process is stuck at kafka.Consumer(). Here's the back-trace from gdb:
gdb -p 1330247
[truncated]
(gdb) info threads
Id Target Id Frame
1 LWP 1330247 "python" 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
2 LWP 1330251 "rdk:broker-1" 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
(gdb) bt 7
0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
1 0x00007f1f0ed28b23 in __pthread_clockjoin_ex () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
2 0x00007f1f0ed2faa4 in thrd_join@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
3 0x00007f1f0105793f in rd_kafka_destroy_internal () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
4 0x00007f1f0105993d in rd_kafka_new () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
5 0x00007f1f0127e008 in Consumer_init () from /nix/store/vr0y9jrjzxmdl8j8c7i2vqq3x0zaza8p-python3.11-confluent-kafka-2.3.0/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so
6 0x00007f1f0f13baa7 in type_call () from /nix/store/25nrdsg4lfzmvkwicm9186xadpff113f-python3-3.11.6/lib/libpython3.11.so.1.0
Description
We have
ulimit
nproc
set for all users, which disallows creation of threads after a certain limit has reached. In such cases, confluent-kafka seems to hang when creating a new consumer.How to reproduce
Create a python script,
test.py
with the following contents:I'm using
kafka.Consumer()
fromconfluent-kafka-python
to initialize the consumer (see the stack-trace below indicating the exact C-level method called).Set the ulimit to a lower value, and run the script:
Observations
kafka.Consumer()
. Here's the back-trace from gdb:0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
1 0x00007f1f0ed28b23 in __pthread_clockjoin_ex () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
2 0x00007f1f0ed2faa4 in thrd_join@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
3 0x00007f1f0105793f in rd_kafka_destroy_internal () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
4 0x00007f1f0105993d in rd_kafka_new () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
5 0x00007f1f0127e008 in Consumer_init () from /nix/store/vr0y9jrjzxmdl8j8c7i2vqq3x0zaza8p-python3.11-confluent-kafka-2.3.0/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so
6 0x00007f1f0f13baa7 in type_call () from /nix/store/25nrdsg4lfzmvkwicm9186xadpff113f-python3-3.11.6/lib/libpython3.11.so.1.0
(More stack frames follow...) (gdb) thread 2 [Switching to thread 2 (LWP 1330251)]
0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
(gdb) bt 7
0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
1 0x00007f1f0ed2676c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
2 0x00007f1f0ed2f69d in cnd_timedwait@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
3 0x00007f1f01087021 in rd_kafka_q_pop_serve[localalias] () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
4 0x00007f1f0106a8f8 in rd_kafka_broker_ops_io_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
5 0x00007f1f0106af39 in rd_kafka_broker_consumer_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
6 0x00007f1f0106b749 in rd_kafka_broker_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
(More stack frames follow...)
Looking at the stack trace, this seems very similar to #3954.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
v2.6.0
3.0.0
auto.offset.reset=earliest, enable.auto.commit=false, debug=all
Red Hat Enterprise Linux 8.9
debug=..
as necessary) from librdkafka