Agent is not able to send messages to kafka broker with authentication.

amolnater-qasource commented 9 months ago

Kibana Build details:

VERSION: 8.12.0 SNAPSHOT
BUILD: 69599
COMMIT: 3aac9bd56d2e1af5fd1b39fdb81b69214453746b

Host OS: All

Preconditions:

8.12.0 SNAPSHOT Kibana cloud environment should be available.
Agents should be installed using agent policy.
Add Kafka output configuration under agent output.

Steps to reproduce:

Select the kafka output under Agent policy settings.
Observe no data is pushed as a message on kafka topic.

NOTE:

No errors are also observed on agent, when Elastic Defend integration is not added.
On adding Elastic Defend integration, only then error message is visible under Agent details page.

Screenshot:

Expected Result: Agent should be able to send messages to kafka broker with authentication.

Agent.json: ip-172-31-73-91-agent-details.zip

Logs: elastic-agent-diagnostics-2023-12-05T12-03-42Z-00.zip elastic-agent-diagnostics-2023-12-05T12-07-19Z-00.zip

elasticmachine commented 9 months ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

amolnater-qasource commented 9 months ago

FYI @cavokz

Broker setup: https://github.com/elastic/security-team/issues/7408

amolnater-qasource commented 9 months ago

@manishgupta-qasource Please review.

manishgupta-qasource commented 9 months ago

Secondary review for this ticket is Done

jlind23 commented 9 months ago

@amolnater-qasource Endpoint looks degraded here. Shall we ask @nfritts and team to look at this first?

cavokz commented 9 months ago

@amolnater-qasource Endpoint looks degraded here. Shall we ask @nfritts and team to look at this first?

That's may or may not be interesting, I'm not sure why but @amolnater-qasource was expecting the agent to become unhealthy when the Defend integration got enabled. He did that on purpose so to get the diagnostic logs that would hopefully tell something also about the kafka broker client.

I was in the call with him when we reproduced this one last time on a fresh cloud deployment, the agent was running on a Debian VM on my laptop. The VM could reach the broker, send and receive messages from the command line but the agent was somewhat unable to send any data while still remaining healthy.

brian-mckinney commented 9 months ago

Just FYI, I took a look at the logs. Endpoint failing during the SSL handshake

from endpoint logs:

KafkaClient.cpp:54 Kafka Error: Local: SSL error [-181] | sasl_ssl://34.16.40.176:9092/bootstrap: SSL handshake failed: error:0A000126:lib(20)::reason(294) (after 127ms in state SSL_HANDSHAKE)

However, it appears that agent was having a similar problem:

from agent logs:

Kafka (topic=qatest): kafka: couldn't write SASL handshake (make sure SASL is enabled on the target broker port)

I tried connecting to the provided broker IPs myself and was unable to. Are we sure the brokers are up and configured properly?

cavokz commented 9 months ago

The broker was up and running but the auth was plaintext username/password, no SSL. I wanted to try again without any authentication but I could just reconfigure the broker, could not redo the agent from scratch because I've lost the url of the 8.12-snapshot agent (it was in Zoom chat).

Now the broker is still in place but on port 9092 you'll find the plaintext without auth, the username/password moved to port 9093.

cavokz commented 9 months ago

This is the docker compose I set on the VMs:

https://github.com/cavokz/siem-team/blob/add-qasource-kafka-server/cm/ansible/roles/kafka/files/docker-compose.yml

I've just double checked the test kafka clusters:

EXTERNAL_NOAUTH://34.16.40.176:9092, EXTERNAL_AUTH://34.16.40.176:9093
EXTERNAL_NOAUTH://34.71.176.30:9092, EXTERNAL_AUTH://34.71.176.30:9093
EXTERNAL_NOAUTH://34.68.101.1:9092, EXTERNAL_AUTH://34.68.101.1:9093

This is my local provectus/kafka-ui configured to access them (ignore the noauth and user-pass clusters): Screenshot 2023-12-05 at 19 04 49

cavokz commented 9 months ago

Eventually I managed to do some more tests. I set up two fleet outputs to the same broker (34.42.9.15), one without authentication (port 9092) and the other with SASL_PLAINTEXT username/password (port 9093):

Screenshot 2023-12-07 at 17 58 51 Screenshot 2023-12-07 at 17 59 34

I configured the agent policy with both, one at time.

The first worked nicely, I could see the messages flowing: Screenshot 2023-12-07 at 18 01 22

The second, consistently with the previous reported findings, failed with these errors:

{"log.level":"info","@timestamp":"2023-12-07T17:42:35.771+0100","message":"Connecting to kafka(34.42.9.15:9093)","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"system/metrics"},"log":{"source":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":137,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-07T17:42:35.771+0100","message":"Connection to kafka(34.42.9.15:9093) established","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"system/metrics"},"log":{"source":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"publisher_pipeline_output","log.origin":{"file.line":145,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-12-07T17:42:37.290+0100","message":"Kafka (topic=fleet-output-user-pass): kafka: couldn't write SASL handshake (make sure SASL is enabled on the target broker port)","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"system/metrics"},"log":{"source":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"log.logger":"kafka","log.origin":{"file.line":337,"file.name":"kafka/client.go","function":"github.com/elastic/beats/v7/libbeat/outputs/kafka.(*client).errorWorker"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-12-07T17:42:39.337+0100","message":"Kafka (topic=fleet-output-user-pass): kafka: couldn't write SASL handshake (make sure SASL is enabled on the target broker port)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"log-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"log"},"log":{"source":"log-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"kafka","log.origin":{"file.line":337,"file.name":"kafka/client.go","function":"github.com/elastic/beats/v7/libbeat/outputs/kafka.(*client).errorWorker"},"ecs.version":"1.6.0"}

Consider the my kafka-ui is configured to access the broker in both the same ways of the fleet output:

Screenshot 2023-12-07 at 18 03 33

Both the configurations work.

Could you @brian-mckinney please double check?

brian-mckinney commented 9 months ago

I've stood up your docker stack and verified the same problem when configuring with Kibana, for both endpoint and system integration. I think I know what is happening.

When you configure the kafka output in Kibana, if you choose Username / Password authentication (even PLAIN mechanism), kibana will add an ssl section to the config's output section

output:
  kafka:
    broker_timeout: 30
    client_id: Elastic
    compression: none
    headers: []
    hosts:
    - 192.168.1.168:9093
    partition:
      random:
        group_events: 1
    password: changeme
    required_acks: 1
    sasl:
      mechanism: PLAIN
    ssl:
      verification_mode: none
    timeout: 30
    topics:
    - topic: my-topic
    type: kafka
    username: elastic
    version: 1.0.0

I assume this is because we want to force sensitive credentials to be encrypted. If Endpoint sees the ssl field in the config, it automatically assumes that the user/pass authentication is using the SASL_SSL protocol when connecting to Kafka. Your docker stack sets up a plaintext SASL listener, therefore username and password will likely never work if configured via Kibana.

I am guessing that beats/system integration is doing the same thing, hence why they are failing the SASL handshake as well. (@faec ?)

If the problem is what I described and is intentional, then I think you will need to update your docker stack in order to test the other authentication mechanisms.

cavokz commented 9 months ago

I'm totally with you about avoiding the communication of credentials over unencrypted channela and it's my main concern in having these test kafka brokers in the open.

Nevertheless there are at least a couple of points that need more clarity:

is it right that the agent remains healthy if communications errors happen with the kafka broker?
is the fleet output configuration clear enough about going SSL for the plain username password? I see the textbox for the CA authority so, now, I understand what that implies.

Is there any documentation about the supported auth configurations?

cmacknz commented 9 months ago

is it right that the agent remains healthy if communications errors happen with the kafka broker?

Right no, expected yes. The Beats run by Elastic Agent does not yet report errors when it can't communicate with the output (for all outputs). This is tracked in https://github.com/elastic/beats/issues/39801 and is something we plan to fix. The implementation for Elastic Defend is separate but may have a similar limitation.

Is there any documentation about the supported auth configurations?

It should be documented at https://www.elastic.co/guide/en/fleet/current/kafka-output-settings.html#_authentication_settings. If that isn't clear enough best thing to do is file an issue in https://github.com/elastic/ingest-docs to get it improved.

cavokz commented 9 months ago

Thanks for all the pointers!

cavokz commented 9 months ago

One last thing: is therefore a bug or a feature that authenticating with username/password switches the underlying communication to encrypted? The documentation does not make this clear.

Who can give an authoritative answer?

cmacknz commented 9 months ago

is therefore a bug or a feature that authenticating with username/password switches the underlying communication to encrypted? The documentation does not make this clear.

It's a feature to prevent users from sending credentials in plaintext, so we'll need to update the docs to clarify this.

cavokz commented 9 months ago

@amolnater-qasource With the merge of https://github.com/elastic/siem-team/pull/1054 and https://github.com/elastic/security-team/pull/8227 it's now possible to test also the username/password authentication over TLS channel. SSL authentication is yet to be implemented.

amolnater-qasource commented 9 months ago

Hi @cavokz

We have revalidated this with the latest instance on 8.12.0 BC2 kibana cloud environment and had below observations:

Agent sends the data to the Kafka broker topic with username/password auth(at port 9093) and without auth(at port 9092).
Further, on restarting too, we are able to reconnect to the same broker and agent is able to send the data.
We are also able to send agent data with dynamic-topics too.

Screenshot:

Build details: VERSION: 8.12.0 BC2 BUILD: 69899 COMMIT: 15a6cc8236b4828b97da733746ec36bd33f03bba

Hence, we are closing this issue and marking as QA:Validated.

Thanks!!

elastic / elastic-agent

Agent is not able to send messages to kafka broker with authentication. #3864