confluentinc / confluent-kafka-python

Confluent's Kafka Python Client
http://docs.confluent.io/current/clients/confluent-kafka-python
Other
3.73k stars 882 forks source link

SSL cert problem "self-signed certificate" and ignored SSL options #1702

Closed thomasnal closed 5 months ago

thomasnal commented 5 months ago

Description

I have a problem authenticating Kafka client because rdkafka fails to retrieve OIDC token with error SSL certificate problem: self-signed certificate in certificate chain (-1).

Issue with this error is supposed to be remedied by using correct cacerts, e.g. from 'certifi' package, or from OS `/etc/ssl/cert.pem', it has been reported in other bug reports. I can replicate this error in an HTTPS request made in Python using requests package and the token endpoint. Providing cacert to the requests package from certifi solves the error in requests. However, when I ask rdkafka to use the same cacert file, rdkafka remains stuck with the error.

{"bootstrap.servers": "redacted",
"debug": "broker,security,topic,msg",
"sasl.mechanism": "OAUTHBEARER",
"security.protocol": "sasl_ssl",
"sasl.oauthbearer.method": "oidc",
"sasl.oauthbearer.client.id": "redacted",
"sasl.oauthbearer.client.secret": "redacted",
"sasl.oauthbearer.extensions": "logicalCluster=redacted,identityPoolId=redacted",
"sasl.oauthbearer.token.endpoint.url": "redacted",
"ssl.ca.location": "redacted/lib/python3.10/site-packages/certifi/cacert.pem"}

Note the ssl.ca.location, the error remains using either cert file from certifi or /etc/ssl/cert.pem. The error remains even when using the certificate provided via ssl.ca.pem option.

Further atttempts with options enable.ssl.certificate.verification: false and ssl.endpoint.identification.algorithm: none leave rdkafka stuck with the same error too. At this point, it appears rdkafka ignores the provided options.

Using .NET confluent kafka with the same token endpoint on the same machine works without an issue. Works in all cases such as providing cert as value in ssl.ca.pem option.

What can I do to get rdkafka called from Python successfully retrieve the token without the error?

How to reproduce

import certifi
import confluent_kafka as ck

options = {
        'bootstrap.servers': 'redacted',
        'debug': 'broker,security,topic,msg',
        'sasl.mechanism': 'OAUTHBEARER',
        'security.protocol': 'sasl_ssl',
        'sasl.oauthbearer.method': 'oidc',
        'sasl.oauthbearer.client.id': 'redacted',
        'sasl.oauthbearer.client.secret': 'redacted',
        'sasl.oauthbearer.extensions': 'logicalCluster=redacted,identityPoolId=redacted',
        'sasl.oauthbearer.token.endpoint.url': 'https://redacted/auth/realms/dep/protocol/openid-connect/token',
        'ssl.ca.location': certifi.where() }
producer = ck.Producer(options)
producer.poll(1)
producer.poll(1)

Output:

%7|1704810077.318|SASL|rdkafka#producer-1| [thrd:app]: Selected provider OAUTHBEARER (builtin) for SASL mechanism OAUTHBEARER
%7|1704810077.318|OPENSSL|rdkafka#producer-1| [thrd:app]: Using statically linked OpenSSL version OpenSSL 3.0.11 19 Sep 2023 (0x300000b0, librdkafka built with 0x300000b0)
%7|1704810077.321|SSL|rdkafka#producer-1| [thrd:app]: Loading CA certificate(s) from file redacted/certifi/cacert.pem
%7|1704810077.336|BRKMAIN|rdkafka#producer-1| [thrd::0/internal]: :0/internal: Enter main broker thread
%7|1704810077.336|BROKER|rdkafka#producer-1| [thrd:app]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Added new broker with NodeId -1
%7|1704810077.336|BRKMAIN|rdkafka#producer-1| [thrd:sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Enter main broker thread
%7|1704810077.336|CONNECT|rdkafka#producer-1| [thrd:app]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Selected for cluster connection: bootstrap servers added (broker has 0 connection attempt(s))
%7|1704810077.336|INIT|rdkafka#producer-1| [thrd:app]: librdkafka v2.3.0 (0x20300ff) rdkafka#producer-1 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_gssapi,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer,http,oidc, STRIP STATIC_LINKING GCC GXX PKGCONFIG OSXLD LIBDL PLUGINS ZLIB SSL SASL_CYRUS ZSTD CURL HDRHISTOGRAM SYSLOG SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER OAUTHBEARER_OIDC, debug 0x246)
%7|1704810077.336|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Received CONNECT op
%7|1704810077.336|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state INIT -> TRY_CONNECT

%3|1704810077.489|OIDC|rdkafka#producer-1| [thrd:background]: Failed to retrieve OIDC token from "https://redacted/auth/realms/dep/protocol/openid-connect/token": SSL certificate problem: self-signed certificate in certificate chain (-1)

%7|1704810078.336|CONNECT|rdkafka#producer-1| [thrd:main]: Cluster connection already in progress: no cluster connection
%7|1704810082.348|DESTROY|rdkafka#producer-1| [thrd:app]: Terminating instance (destroy flags none (0x0))

Checklist

Please provide the following information:

thomasnal commented 5 months ago

For cleanliness, I've edited code in the issue to better show the use of certifi path to certificates so that potential doubts about the path are satisfied.

thomasnal commented 5 months ago

After I compiled confluent-kafka-python from source the connect succeeds,

...
%7|1704900329.560|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state INIT -> TRY_CONNECT
%7|1704900329.702|BRKMAIN|rdkafka#producer-1| [thrd:background]: Waking up waiting broker threads after setting OAUTHBEARER token
%7|1704900329.702|WAKEUP|rdkafka#producer-1| [thrd:background]: Wake-up sent to 1 broker thread in state >= TRY_CONNECT: OAUTHBEARER token update
%7|1704900329.702|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: broker in state TRY_CONNECT connecting
%7|1704900329.702|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-poxg5.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state TRY_CONNECT -> CONNECT
...
brew install librdkafka
C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/2.3.0/include/
pip install --no-binary confluent-kafka confluent-kafka

Can anyone explain and update the released wheel?

Pilipets commented 5 months ago

Most likely related to https://github.com/confluentinc/librdkafka/issues/3751

thomasnal commented 5 months ago

Most likely related to confluentinc/librdkafka#3751

🥇 thank you so much for pointing this out. this is very much the issue. it helps heaps, I understand the underlaying design and the issue so that I can create a workaround for the systems where it is failing.

Pilipets commented 5 months ago

Yeah, I also did a similar workaround on my side.

Moved OIDC logic to a separate compilation unit, added additional ssl location for oauthbearer with three options (inherit, system, specific path) and used it with a custom Oautbearer callback implementation to include a modification not to decode tokens if expires_in is present - https://github.com/confluentinc/librdkafka/issues/4242