Open froque opened 1 week ago
As a workaround, disabling tls_verify
or setting tls_ca_cert
works
$ tail -n2 /etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
tls_verify: false
tls_ca_cert: /opt/datadog-agent/embedded/ssl/certs/cacert.pem
Hello @froque! Thanks for opening this issue and the workaround. I'm going to transfer the issue to integrations-core because this is where the integrations lives. I'll let them know so they'll be able to take care of this.
@froque can you open a support case? Also, you can use the script in tests/python_client/script.py to run a barebones connection directly to the cluster for debugging. This script will attempt a connection and then fetch all of the consumer groups for that configuration. Please include it with the support case along with a Debug flare.
$ /opt/datadog-agent/embedded/bin/python script.py
bootstrap.servers=<redacted>
socket.timeout.ms=5000
client.id=dd-agent
security.protocol=sasl_ssl
ssl.endpoint.identification.algorithm=none
enable.ssl.certificate.verification=true
sasl.mechanism=PLAIN
sasl.username=<redacted>
sasl.password=*****
Connecting to AdminClient
%3|1719239854.080|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:16000069:STORE routines::unregistered scheme: scheme=file
%3|1719239854.080|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:80000002:system library::No such file or directory: calling stat(/usr/local/ssl/certs)
%3|1719239854.080|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:16000069:STORE routines::unregistered scheme: scheme=file
%3|1719239854.080|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:80000002:system library::No such file or directory: calling stat(/usr/local/ssl/certs)
%3|1719239854.080|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:16000069:STORE routines::unregistered scheme: scheme=file
%3|1719239854.081|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:80000002:system library::No such file or directory: calling stat(/usr/local/ssl/certs)
%3|1719239854.081|FAIL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: SSL handshake failed: error:0A000086:SSL routines::certificate verify failed: broker certificate could not be verified, verify that ssl.ca.location is correctly configured or root CA certificates are installed (install ca-certificates package) (after 34ms in state SSL_HANDSHAKE)
%3|1719239855.009|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:16000069:STORE routines::unregistered scheme: scheme=file
%3|1719239855.009|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:80000002:system library::No such file or directory: calling stat(/usr/local/ssl/certs)
%3|1719239855.010|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:16000069:STORE routines::unregistered scheme: scheme=file
%3|1719239855.010|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:80000002:system library::No such file or directory: calling stat(/usr/local/ssl/certs)
%3|1719239855.010|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:16000069:STORE routines::unregistered scheme: scheme=file
%3|1719239855.010|SSL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: error:80000002:system library::No such file or directory: calling stat(/usr/local/ssl/certs)
%3|1719239855.010|FAIL|dd-agent#producer-1| [thrd:sasl_ssl://<redacted>:9092/bootstr]: sasl_ssl://<redacted>:9092/bootstrap: SSL handshake failed: error:0A000086:SSL routines::certificate verify failed: broker certificate could not be verified, verify that ssl.ca.location is correctly configured or root CA certificates are installed (install ca-certificates package) (after 32ms in state SSL_HANDSHAKE, 1 identical error(s) suppressed)
^CTraceback (most recent call last):
File "/home/pminds/script.py", line 87, in <module>
main()
File "/home/pminds/script.py", line 80, in main
results = future.result()
^^^^^^^^^^^^^^^
File "/opt/datadog-agent/embedded/lib/python3.11/concurrent/futures/_base.py", line 451, in result
self._condition.wait(timeout)
File "/opt/datadog-agent/embedded/lib/python3.11/threading.py", line 327, in wait
waiter.acquire()
KeyboardInterrupt
From what I have already explored, it seems that in version v7.54.0 it expects a file in /usr/local/ssl/certs
and not in /opt/datadog-agent/embedded/ssl/certs/
like in v7.53.0.
Your logs were successfully uploaded. For future reference, your internal case id is 1751844
From what I have already explored, it seems that in version v7.54.0 it expects a file in /usr/local/ssl/certs and not in /opt/datadog-agent/embedded/ssl/certs/ like in v7.53.0.
=> @froque
Can you elaborate on where did you find this change?
Also, can you try to use port 9091 instead for the kafka broker (update the config on kafka side) and set the same port on datadog side (in the script.py) then try to run the script again and see if it works?
@HadhemiDD I messed around in differences between the v73 and v74 debian files.
❯ wget --quiet https://apt.datadoghq.com/pool/d/da/datadog-agent_7.53.0-1_amd64.deb
❯ wget --quiet https://apt.datadoghq.com/pool/d/da/datadog-agent_7.54.0-1_amd64.deb
❯ mkdir v7.53 v7.54
❯ ar --output v7.53 x datadog-agent_7.53.0-1_amd64.deb
❯ ar --output v7.54 x datadog-agent_7.54.0-1_amd64.deb
❯ tar --directory=v7.53 -Jxf v7.53/data.tar.xz
❯ tar --directory=v7.54 -Jxf v7.54/data.tar.xz
I noticed that librdkafka is no longer in the same path
❯ find -name \*librdkafka\*so\* -type f
./v7.53/opt/datadog-agent/embedded/lib/librdkafka++.so.1
./v7.53/opt/datadog-agent/embedded/lib/librdkafka.so.1
./v7.54/opt/datadog-agent/embedded/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-27145264.so.1
And a new libcrypto exists
❯ find -name \*libcrypto\*so\* -type f| sort
./v7.53/opt/datadog-agent/embedded/lib/libcrypto.so.3
./v7.53/opt/datadog-agent/embedded/lib/python3.11/site-packages/psycopg2_binary.libs/libcrypto-7d0e8add.so.1.1
./v7.54/opt/datadog-agent/embedded/lib/libcrypto.so.3
./v7.54/opt/datadog-agent/embedded/lib/python3.11/site-packages/aerospike.libs/libcrypto-e31f2095.so.3
./v7.54/opt/datadog-agent/embedded/lib/python3.11/site-packages/confluent_kafka.libs/libcrypto-b840c11b.so.3
./v7.54/opt/datadog-agent/embedded/lib/python3.11/site-packages/psycopg2_binary.libs/libcrypto-7d0e8add.so.1.1
searching for some strings
❯ rgrep '/opt/datadog-agent/embedded/ssl/certs' v7*
grep: v7.53/opt/datadog-agent/embedded/lib/libcrypto.so.3: binary file matches
grep: v7.54/opt/datadog-agent/embedded/lib/libcrypto.so.3: binary file matches
❯ rgrep '/usr/local/ssl/certs' v7*
grep: v7.54/opt/datadog-agent/embedded/lib/python3.11/site-packages/confluent_kafka.libs/libcrypto-b840c11b.so.3: binary file matches
grep: v7.54/opt/datadog-agent/embedded/lib/python3.11/site-packages/aerospike.libs/libcrypto-e31f2095.so.3: binary file matches
Agent Environment
Describe what happened:
After upgrading to 7.54.0, Kafka consumer lag checks started to fail
Describe what you expected:
Expected Datadog Agent to continue to get Kafka consumer lag offsets from Kafka cluster.
Steps to reproduce the issue:
instances:
kafka_connect_str:
sasl_plain_password:
kafka_consumer_offsets: true monitor_unlisted_consumer_groups: true
$ sudo datadog-agent check kafka_consumer
Running Checks
kafka_consumer (4.3.0)
Instance ID: kafka_consumer:24b8757764ea1a30 [ERROR] Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml Total Runs: 1 Metric Samples: Last Run: 0, Total: 0 Events: Last Run: 0, Total: 0 Service Checks: Last Run: 0, Total: 0 Average Execution Time : 5.099s Last Execution Date : 2024-06-24 09:11:07 WEST / 2024-06-24 08:11:07 UTC (1719216667000) Last Successful Execution Date : Never Error: Unable to connect to the AdminClient. This is likely due to an error in the configuration. Traceback (most recent call last): File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 34, in check self.client.request_metadata_update() File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/kafka_consumer/client.py", line 180, in request_metadata_update self.kafka_client.list_topics(None, timeout=self.config._request_timeout) File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/confluent_kafka/admin/init.py", line 603, in list_topics return super(AdminClient, self).list_topics(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/base/checks/base.py", line 1224, in run self.check(instance) File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 36, in check raise Exception( Exception: Unable to connect to the AdminClient. This is likely due to an error in the configuration.
Metadata
config.hash: kafka_consumer:24b8757764ea1a30 config.provider: file
Additional environment details (Operating System, Cloud provider, etc):