fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.74k stars 1.56k forks source link

Kafka output - plaintext complaint even with ssl #5607

Closed perezjasonr closed 1 year ago

perezjasonr commented 2 years ago

Bug Report

Describe the bug Despite telling fluent bit to use ssl, I am getting a complaint:

"SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener?"

To Reproduce

I am setting up kafka using this example:

https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/networking/external-access-static-host-based

in particular, the istio ingress gateway version:

https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/networking/external-access-static-host-based/istio this is using the domain 00nmvb28.wooden-proton.com so their demo domains or domain placeholders are swapped with this.

via the fluent operator in kubernetes, I am asking fluent to send to an external facing kafka as such:

[Output]
    Name    kafka
    Match_Regex    (?:kube|service)\.(.*)
    Brokers    b0.00nmvb28.wooden-proton.com:443,b1.00nmvb28.wooden-proton.com:443,b2.00nmvb28.wooden-proton.com:443
    Topics    ks-log
    rdkafka.enable.ssl.certificate.verification    false
    rdkafka.security.protocol    ssl

However I get these logs:

also if I remove that rdkafka ssl section, now we have the opposite complaint so I just can't seem to win haha:

"might be caused by incorrect security.protocol configuration (connecting to a SSL listener?)"

[2022/06/21 20:49:27] [ info] [output:kafka:kafka.0] fluent-bit#producer-1: [thrd:b2.00nmvb28.wooden-proton.com:443/bootstrap]: b2.00nmvb28.wooden-proton.com:443/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 9006ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
[2022/06/21 20:49:27] [ info] [output:kafka:kafka.0] fluent-bit#producer-1: [thrd:b1.00nmvb28.wooden-proton.com:443/bootstrap]: b1.00nmvb28.wooden-proton.com:443/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 10000ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)

Expected behavior

it should send to the topic/kafka broker and/or establish a successful connection (see successful openssl below)

Screenshots

Your Environment

AWS, kubernetes v 1.22 fluentbit v1.8.11 Kafka plugin, tail input

Additional context

openssl seems to be fine with a connection:

openssl s_client -connect b0.00nmvb28.wooden-proton.com:443

CONNECTED(00000003)
depth=2 C = US, ST = Arizona, O = perez, OU = perez, CN = 00nmvb28.wooden-proton.com, emailAddress = ca@00nmvb28.wooden-proton.com
verify return:1
depth=1 C = US, ST = Arizona, O = perez, OU = perez, CN = 00nmvb28.wooden-proton.com, emailAddress = ca@00nmvb28.wooden-proton.com
verify return:1
depth=0
verify return:1
---
Certificate chain
 0 s:
   i:C = US, ST = Arizona, O = perez, OU = perez, CN = 00nmvb28.wooden-proton.com, emailAddress = ca@00nmvb28.wooden-proton.com
 1 s:C = US, ST = Arizona, O = perez, OU = perez, CN = 00nmvb28.wooden-proton.com, emailAddress = ca@00nmvb28.wooden-proton.com
   i:C = US, ST = Arizona, O = perez, OU = perez, CN = 00nmvb28.wooden-proton.com, emailAddress = ca@00nmvb28.wooden-proton.com
---
Server certificate
-----BEGIN CERTIFICATE-----
<redacted>
-----END CERTIFICATE-----
subject=

issuer=C = US, ST = Arizona, O = perez, OU = perez, CN = 00nmvb28.wooden-proton.com, emailAddress = ca@00nmvb28.wooden-proton.com

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 3516 bytes and written 420 bytes
Verification: OK
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: F035D09B46493D038335A33EA88302C1A25CB2ACE83A00C3D3E4DBF2AA43C6BB
    Session-ID-ctx:
    Master-Key: AE21F300B12DE4210E9D71D4631C2D9BBE9E817A2AEEC45EAC13D4493FE330F78F342EC8D016ED82DD66EF57990FF45A
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1655843646
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: yes
---
patrick-stephens commented 2 years ago

The Kafka ouput plugin is disabled by default for 1.8.11 (and even now): https://github.com/fluent/fluent-bit/blob/2c4ccb5f67cf20a2b348d9057b73c7dd18b4a33f/CMakeLists.txt#L176

Did you build from source with it enabled? How? It may be an issue with the SSL configuration at build hence my query.

Is it reproducible with the latest versions?

perezjasonr commented 2 years ago

I'm using the fluent operator which gives kafka as an option in the values (and explains so in the documentation) so its likely fluent operator is using images where its enabled

https://github.com/kubesphere-sigs/fluent-operator-walkthrough#using-fluent-bit-to-collect-k8s-application-logs-and-output-to-kafka-elasticsearch-and-loki

patrick-stephens commented 2 years ago

Does it work in the sandbox we have for that workshop? Trying to narrow down if something is not working internally or just because of config: https://info.calyptia.com/fluent-operator-hands-on-workshop?hsLang=en

perezjasonr commented 2 years ago

So I just ran through this a few times and I realized that in the last section it says "finally we will forward to kafka" (paraphrasing), but then the actual confs/commands shown on the right seem to stop at section before it. I'm wondering if the final kafka one is missing because in the end it sends to elasticsearch and then it says congrats you've completed this.

perezjasonr commented 2 years ago

So heres what I think might be happening. librdkafka, or at least the fluent plugin doesn't have all the necessary options. i got kafka exposed externally via this example:

https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/networking/external-access-static-host-based

the producer cli example there did work for me and it was reachable via outside and produced the messages and they showed up. However, it required these client properties:


bootstrap.servers=kafka.$DOMAIN:443
security.protocol=SSL
ssl.truststore.location=$TUTORIAL_HOME/client/client.truststore.p12
ssl.truststore.password=mystorepassword
ssl.truststore.type=PKCS12
ssl.keystore.location=$TUTORIAL_HOME/client/client.keystore.p12
ssl.keystore.password=mystorepassword
ssl.keystore.type=PKCS12

and for fluent's librdkafka settings, i dont see for example, the truststore option. I'm guessing confluent platform (at least not yet) just isn't accommodating other scenarios like when we just want a client cert/key/ca. I tried to do the keystore and a regular ca and it didnt like it either so i guess it has to be keystore+truststore for now. I think its very "java" right now.

I'm guessing with non confluent kafka's like the strimzi one in the lab you linked, fluent conf can be setup to talk to it, but the confluent platform seems inflexible in this regard...otherwise I have yet to see fluentbit sending to an externally exposed confluent kafka.

so far those are my conclusions.

perezjasonr commented 2 years ago

i opened an issue with librdkafka, is it possible to know what version of librdkafka fluentbit 1.8.11 uses?

https://github.com/edenhill/librdkafka/issues/3889

benjaminhuo commented 2 years ago

The Kafka ouput plugin is disabled by default for 1.8.11 (and even now):

https://github.com/fluent/fluent-bit/blob/2c4ccb5f67cf20a2b348d9057b73c7dd18b4a33f/CMakeLists.txt#L176

Did you build from source with it enabled? How? It may be an issue with the SSL configuration at build hence my query.

Is it reproducible with the latest versions?

Fluent operator release a new version https://github.com/fluent/fluent-operator/releases/tag/v1.1.0 which uses FluentBit v1.9.4 instead

patrick-stephens commented 2 years ago

i opened an issue with librdkafka, is it possible to know what version of librdkafka fluentbit 1.8.11 uses?

edenhill/librdkafka#3889

All the vendored dependencies are versioned in the source, e.g. https://github.com/fluent/fluent-bit/tree/master/lib/librdkafka-1.8.2

You'll have to check for 1.8.11 - which is also an old version now so I'd suggest the step up to 1.9 as per @benjaminhuo comment above.

perezjasonr commented 2 years ago

I can try that but what i really think is happening now is a "disagreement" between confluent's kafka and librdkafka.

https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

if exposed externally via ingress controller or ingress gateway (istio) it likely must be a truststore/keystore for the client and as you can see theres no truststore so fluentbit also can't do it.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.