Closed amolnater-qasource closed 9 months ago
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
FYI @cavokz
Broker setup: https://github.com/elastic/security-team/issues/7408
@manishgupta-qasource Please review.
Secondary review for this ticket is Done
@amolnater-qasource Endpoint looks degraded here. Shall we ask @nfritts and team to look at this first?
@amolnater-qasource Endpoint looks degraded here. Shall we ask @nfritts and team to look at this first?
That's may or may not be interesting, I'm not sure why but @amolnater-qasource was expecting the agent to become unhealthy when the Defend integration got enabled. He did that on purpose so to get the diagnostic logs that would hopefully tell something also about the kafka broker client.
I was in the call with him when we reproduced this one last time on a fresh cloud deployment, the agent was running on a Debian VM on my laptop. The VM could reach the broker, send and receive messages from the command line but the agent was somewhat unable to send any data while still remaining healthy.
Just FYI, I took a look at the logs. Endpoint failing during the SSL handshake
from endpoint logs:
KafkaClient.cpp:54 Kafka Error: Local: SSL error [-181] | sasl_ssl://34.16.40.176:9092/bootstrap: SSL handshake failed: error:0A000126:lib(20)::reason(294) (after 127ms in state SSL_HANDSHAKE)
However, it appears that agent was having a similar problem:
from agent logs:
Kafka (topic=qatest): kafka: couldn't write SASL handshake (make sure SASL is enabled on the target broker port)
I tried connecting to the provided broker IPs myself and was unable to. Are we sure the brokers are up and configured properly?
The broker was up and running but the auth was plaintext username/password, no SSL. I wanted to try again without any authentication but I could just reconfigure the broker, could not redo the agent from scratch because I've lost the url of the 8.12-snapshot agent (it was in Zoom chat).
Now the broker is still in place but on port 9092 you'll find the plaintext without auth, the username/password moved to port 9093.
This is the docker compose I set on the VMs:
I've just double checked the test kafka clusters:
EXTERNAL_NOAUTH://34.16.40.176:9092, EXTERNAL_AUTH://34.16.40.176:9093
EXTERNAL_NOAUTH://34.71.176.30:9092, EXTERNAL_AUTH://34.71.176.30:9093
EXTERNAL_NOAUTH://34.68.101.1:9092, EXTERNAL_AUTH://34.68.101.1:9093
This is my local provectus/kafka-ui
configured to access them (ignore the noauth
and user-pass
clusters):
Eventually I managed to do some more tests. I set up two fleet outputs to the same broker (34.42.9.15
), one without authentication (port 9092
) and the other with SASL_PLAINTEXT username/password (port 9093
):
I configured the agent policy with both, one at time.
The first worked nicely, I could see the messages flowing:
The second, consistently with the previous reported findings, failed with these errors:
{"log.level":"info","@timestamp":"2023-12-07T17:42:35.771+0100","message":"Connecting to kafka(34.42.9.15:9093)","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"system/metrics"},"log":{"source":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":137,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-07T17:42:35.771+0100","message":"Connection to kafka(34.42.9.15:9093) established","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"system/metrics"},"log":{"source":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"publisher_pipeline_output","log.origin":{"file.line":145,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-12-07T17:42:37.290+0100","message":"Kafka (topic=fleet-output-user-pass): kafka: couldn't write SASL handshake (make sure SASL is enabled on the target broker port)","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"system/metrics"},"log":{"source":"system/metrics-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"log.logger":"kafka","log.origin":{"file.line":337,"file.name":"kafka/client.go","function":"github.com/elastic/beats/v7/libbeat/outputs/kafka.(*client).errorWorker"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-12-07T17:42:39.337+0100","message":"Kafka (topic=fleet-output-user-pass): kafka: couldn't write SASL handshake (make sure SASL is enabled on the target broker port)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"log-e76fbfb5-401e-48a3-af78-ba984fc0094e","type":"log"},"log":{"source":"log-e76fbfb5-401e-48a3-af78-ba984fc0094e"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"kafka","log.origin":{"file.line":337,"file.name":"kafka/client.go","function":"github.com/elastic/beats/v7/libbeat/outputs/kafka.(*client).errorWorker"},"ecs.version":"1.6.0"}
Consider the my kafka-ui is configured to access the broker in both the same ways of the fleet output:
Both the configurations work.
Could you @brian-mckinney please double check?
I've stood up your docker stack and verified the same problem when configuring with Kibana, for both endpoint and system integration. I think I know what is happening.
When you configure the kafka output in Kibana, if you choose Username / Password
authentication (even PLAIN mechanism), kibana will add an ssl
section to the config's output section
output:
kafka:
broker_timeout: 30
client_id: Elastic
compression: none
headers: []
hosts:
- 192.168.1.168:9093
partition:
random:
group_events: 1
password: changeme
required_acks: 1
sasl:
mechanism: PLAIN
ssl:
verification_mode: none
timeout: 30
topics:
- topic: my-topic
type: kafka
username: elastic
version: 1.0.0
I assume this is because we want to force sensitive credentials to be encrypted. If Endpoint sees the ssl field in the config, it automatically assumes that the user/pass authentication is using the SASL_SSL
protocol when connecting to Kafka. Your docker stack sets up a plaintext SASL listener, therefore username and password will likely never work if configured via Kibana.
I am guessing that beats/system integration is doing the same thing, hence why they are failing the SASL handshake as well. (@faec ?)
If the problem is what I described and is intentional, then I think you will need to update your docker stack in order to test the other authentication mechanisms.
I'm totally with you about avoiding the communication of credentials over unencrypted channela and it's my main concern in having these test kafka brokers in the open.
Nevertheless there are at least a couple of points that need more clarity:
Is there any documentation about the supported auth configurations?
is it right that the agent remains healthy if communications errors happen with the kafka broker?
Right no, expected yes. The Beats run by Elastic Agent does not yet report errors when it can't communicate with the output (for all outputs). This is tracked in https://github.com/elastic/beats/issues/39801 and is something we plan to fix. The implementation for Elastic Defend is separate but may have a similar limitation.
Is there any documentation about the supported auth configurations?
It should be documented at https://www.elastic.co/guide/en/fleet/current/kafka-output-settings.html#_authentication_settings. If that isn't clear enough best thing to do is file an issue in https://github.com/elastic/ingest-docs to get it improved.
Thanks for all the pointers!
One last thing: is therefore a bug or a feature that authenticating with username/password switches the underlying communication to encrypted? The documentation does not make this clear.
Who can give an authoritative answer?
is therefore a bug or a feature that authenticating with username/password switches the underlying communication to encrypted? The documentation does not make this clear.
It's a feature to prevent users from sending credentials in plaintext, so we'll need to update the docs to clarify this.
@amolnater-qasource With the merge of https://github.com/elastic/siem-team/pull/1054 and https://github.com/elastic/security-team/pull/8227 it's now possible to test also the username/password authentication over TLS channel. SSL authentication is yet to be implemented.
Hi @cavokz
We have revalidated this with the latest instance on 8.12.0 BC2 kibana cloud environment and had below observations:
Screenshot:
Build details: VERSION: 8.12.0 BC2 BUILD: 69899 COMMIT: 15a6cc8236b4828b97da733746ec36bd33f03bba
Hence, we are closing this issue and marking as QA:Validated.
Thanks!!
Kibana Build details:
Host OS: All
Preconditions:
Steps to reproduce:
NOTE:
Screenshot:
Expected Result: Agent should be able to send messages to kafka broker with authentication.
Agent.json: ip-172-31-73-91-agent-details.zip
Logs: elastic-agent-diagnostics-2023-12-05T12-03-42Z-00.zip elastic-agent-diagnostics-2023-12-05T12-07-19Z-00.zip