nimarezainia commented 2 years ago

Kafka output UI

Similar to Logstash output, we need to add the option for users to specify Kafka as an output option for their data. In 8.8, this UI will be hidden behind an experimental flag as the shipper portion is not ready until 8.9.

Tasks

[x] Add Kafka output type in output API and make sure preconfiguration (kibana.yml) supports it
[x] Make sure that Kafka output type cannot be chosen for Fleet Server policies, similar to the exclusion for logstash type
[x] Translate Fleet Kafka output settings to correct elastic-agent.yml fields (most are the same, but there are a few differences due to needing information for UI)
- Reference doc for yaml fields
[x] Add Kafka output experimental flag
[x] Provide guidance for docs issue https://github.com/elastic/ingest-docs/issues/68
[x] Update openapi specs

API

The output API should support a new output type: kafka.

See Kafka Output type

This output type should have the following properties: ``` hosts[]: uri version?: string // defaults to 1.0.0 by beats/agent if not set key?: string compression?: 'snappy' | 'lz4' | 'gzip' // defaults to gzip compression_level?: integer // only for gzip compression, defaults to 4 client_id: string // authentication can done using: // username/password, ssl, or kerberos auth_type: 'user_pass' | 'ssl' | 'kerberos' // auth: username/password username?: string password?: string sasl.mechanism?: 'PLAIN' | 'SCRAM-SHA-256' | 'SCRAM-SHA-512' // auth: ssl ssl.certificate_authorities?: string ssl.certificate?: string ssl.key?: string // auth: kerberos - should be marked as beta // TBD: to check if we should do this as part of phase 1 if it is in beta // partitioning settings partition: 'random' | 'round_robin' | 'hash' // defaults to 'hash' random.group_events?: integer round_robin.group_events?: integer hash.hash?: string hash.random?: boolean // TBD: check the type of this field // topics array topics: topic: string when.type?: 'equals' | 'contains' | 'regexp' | 'range' | 'network' | 'has_fields' | 'or' | 'and' | 'not' when.condition?: string // headers array headers?: key: string value: string // broker ```

UI tasks

[ ] Implement Kafka output form based on designs
- [ ] Add "Kafka" option to output type dropdown if experimental flag is enabled
- [ ] Top section
- [ ] Kafka Version dropdown
  - [ ] Valid range 0.8.2.0 through 2.6.0 (verify all the available options)
  - [ ] Defaults to 1.0.0
  - [ ] Description: Specify the URLs that your agents will use to connect to Kafka. For more information, see the Fleet User Guide
- [ ] Hosts field - List of brokers on port 9092
  - [ ] Required field
  - [ ] Needs to have Add row button below
- [ ] Authentication section - selection by radio buttons and different fields for:
- [ ] Option 1 - Username/Password
  - [ ] SASL Mechanism (radio buttons list)
    - PLAIN
    - SCRAM-SHA-256
    - SCRAM-SHA-512
- [ ] Option 2 - SSL (Similar design pattern as Logstash)
  - [ ] Server SSL certificate authorities (text input)
  - [ ] Add multiple certificate authorities (button)
  - [ ] Client SSL certificate (text input)
  - [ ] Client SLL certificate Key (text input)
- [ ] Option 3 - Kerberos (TBD, may not be needed in this phase. It might be marked as beta)
- [ ] Partitioning section
- [ ] Selection by radio buttons and different fields for:
  - [ ] Option 1 - Random
  - [ ] Number of events (input box)
  - [ ] Option 2 - Round robin
  - [ ] Number of events (input box)
  - [ ] Option 3 - Hash
  - [ ] List of comma separated fields used to compute the partition hash value from
- [ ] Topics section
- [ ] Default topic - this should be saved to the last item of the topics[] array (text input box)
- [ ] Processors - should be able to add arbitrary amount of processors and be able to remove and reorder them
- [ ] All the possible conditions are described here
- [ ] Headers section
- [ ] Key and value text inputs
- [ ] Should be able to add arbitrary amount of key/value pairs (more definition needed on this)
- [ ] Client ID field that defaults to Elastic Agent
- [ ] Compression section
- [ ] Toggle to enable/disable
- [ ] If enabled, provide dropdown for different compression types,
- [ ] Defaults to gzip and compression level 4
- [ ] Dropdown can be one of:
  - none
  - snappy
  - lz4
  - gzip
- [ ] If gzip, also show field for compression level
- [ ] Broker settings section
- [ ] Broker Timeout dropdown - default 30s
  - [ ] Description: Define how long a Kafka server waits for data in the same cluster
- [ ] Broker Reachability Timeout dropdown - default 30s
  - [ ] Description: Define how long an Agent would wait for a response from Kafka Broker
  - [ ] Channel Buffer Size - default 256
  - [ ] Description: Define the number of messages buffered in output pipeline
  - [ ] ACK reliability dropdown - default Wait for local commit
  - [ ] Description: Reliability level required from the broker
  - [ ] Options:
    - Wait for local commit
    - Wait for all replicas to commit
    - Do not wait
- [ ] Key field (text input)
- [ ] Optional formatted string specifying the Kafka event key
- [ ] Syntax described here and examples are here
- [ ] Description: If configured, the event key can be extracted from the event using a format string

Designs

Kafka_output_1 Kafka_output_2 Kafka_output_4 Kafka_output_5

Open questions

[ ] Should we add an "experimental" or "beta" badge to the UI (similar to what was done for the shipper section?)

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Feature:Fleet)

jlind23 commented 1 year ago

@jen-huang @nimarezainia Do we already have the design for this work?

jen-huang commented 1 year ago

@jlind23 Yes we do, link to the designs can be found in the product definition doc in parent issue of this one.

jlind23 commented 1 year ago

@jen-huang is the tech definition ready to be worked on in our next sprint?

jen-huang commented 1 year ago

@jlind23 I'm still going to work on it this week.

jlind23 commented 1 year ago

@jen-huang As you change this issue title to implement I believe the status should be changed to ready accordingly? Shall I also remove your assignment?

criamico commented 1 year ago

I'm currently looking at the schema validation for the new kafka type and it would be much cleaner if we moved from

/api/fleet/outputs
{
  type: 'kafka'
  ... 
}

to

/api/fleet/outputs/kafka
{
  ...
}

Of course we would keep the old endpoint for a few releases and mark it as deprecated. @kpollich suggested to redirect the code to the right path based on the type property in the request body.

jen-huang commented 1 year ago

As discussed, will move this to Sprint 12 and continue the work there.

joshdover commented 1 year ago

We will likely need to feature flag this as the Agent work will not be ready in the same release. However, we still want to enable customers to test SNAPSHOT builds of the agent once it is ready. I think we should use the "Advanced Settings" Kibana infra to do the feature flagging instead of kibana.yml settings to easily enable a customer to turn on this feature without having to reconfigure and restart Kibana.

jlind23 commented 1 year ago

@criamico this will be included in our next sprint. @joshdover had a great idea about first delivering the API experience to unblock users and in a second PR work on the UI part. Both should land in separate releases if needed. What do you think?

cc @juliaElastic

nimarezainia commented 1 year ago

the API approach is fine as a first step. However what the users will need is the full UI capabilities. Also we would need to get some sample API calls that show the user how to configure Kafka in this case and think about how that would show up in the fleet UI with other outputs present.

In otherwords, if the user uses the API to create the output and configure it, what would the other users see in the Fleet UI?

jlind23 commented 1 year ago

@nimarezainia The API first approach has the benefit of unblocking Elastic Agent E2E tests with Kafka output, it does not necessarily imply that we should ship it to our users without any UI.

joshdover commented 1 year ago

In otherwords, if the user uses the API to create the output and configure it, what would the other users see in the Fleet UI?

This is a good question. We'll still have to make a few UI adjustments to make sure this new output type doesn't break our existing UIs.

I would suggest just showing the output row for the Kafka outputs in the Settings tab, but disabling the edit button with a tooltip: "Use the Fleet API to edit this output"

elasticmachine commented 1 year ago

Pinging @elastic/security-defend-workflows (Team:Defend Workflows)

szwarckonrad commented 1 year ago

Schema proposed changes:

Added none option to compression following docs
client_id from required to optional since we provide default value following doc
password is a required field if username was provided, following docs
sasl.mechanism is set to PLAIN by default if username and password fields are set, following doc
partition from required to optional since we provide default value following docs
broker section filled with optional timeout and broker_timeout

hosts: string[]
version?: string // defaults to 1.0.0 by beats/agent if not set
key?: string
compression?: 'snappy' | 'lz4' | 'gzip' | 'none' // defaults to gzip
compression_level?: integer // only for gzip compression, defaults to 4
client_id?: string // default Elastic Agent

// authentication can done using:
//   username/password, ssl, or kerberos
auth_type: 'user_pass' | 'ssl' | 'kerberos'

// auth: username/password
username?: string // must be present if auth_type === 'user_pass'
password?: string // must be present if username was provided
sasl.mechanism?: 'PLAIN' | 'SCRAM-SHA-256' | 'SCRAM-SHA-512' // defaults to `PLAIN` if username and password 

// auth: ssl
ssl.certificate_authorities?: string
ssl.certificate?: string
ssl.key?: string

// auth: kerberos - should be marked as beta
// TBD: to check if we should do this as part of phase 1 if it is in beta

// partitioning settings
partition?: 'random' | 'round_robin' | 'hash' // defaults to 'hash'
random?.group_events?: integer // defaults to 1
round_robin>.group_events?: integer // defaults to 1
hash?.hash?: string
hash.random?: boolean // TBD: check the type of this field

// topics array
topics:
  topic: string
  when.type?: 'equals' | 'contains' | 'regexp' | 'range' | 'network' | 'has_fields' | 'or' | 'and' | 'not'
  when.condition?: string

// headers array
headers?[]:
  key: string
  value: string

// broker
tiemout?: integer // defaults to 30
broker_timeout?: integer // defaults to 10

jlind23 commented 1 year ago

@kevinlog from a pure planning perspective, is it fair to say that the API should probably be available in the next release or so?

criamico commented 1 year ago

Adding here some more details about the definitions of done for the API work. They mirror the Logstash integration tests in https://github.com/elastic/kibana/blob/main/x-pack/test/fleet_api_integration/apis/outputs/crud.ts

Create

[ ] Create a new Kafka Output
- verify that existing policies have its parameters in the yaml
[ ] Create a new Default Kafka Output
- apply it as a global default
- verify that existing policies have its parameters in the yaml
- verify that policies having fleet-server integration don't get get switched to the new output
[ ] Create a new Default Kafka Output
- apply it as a "per policy" default
- verify that the selected policy have its parameters in the yaml
- verify that other policies maintain their default
  Edit
[ ] Have an existing Kafka Output (not default) and update it, keeping it kafka type
- verify that its parameters are updated correctly
  Not default -> default
[ ] Have an existing Kafka Output (not default) and update it to default
- verify that its parameters are updated correctly
- verify that existing policies have its parameters in the yaml
- verify that policies having fleet-server integration don't get switched to the new output
  Type Changing
[ ] Have an existing ES or Logstash Output (not default) and update it, making it kafka type
- verify that its parameters are updated correctly
[ ] Have an existing ES Default Output and update it to kafka type (maintain the default)
- verify that the update is not allowed with error Kafka output cannot be used with Fleet Server integration in Fleet Server Policy. Please create a new ElasticSearch output.
- verify that policies having fleet-server integration don't get switched to the new output
[ ] Have an existing Kafka Output (not default) and update it to ES/Logstash
- verify that its parameters are updated correctly
- verify that existing policies have its parameters in the yaml
[ ] Have an existing Default Kafka Output and update it to ES/Logstash
- verify that its parameters are updated correctly
- verify that existing policies have its parameters in the yaml
- verify that policies having fleet-server integration don't get get switched to the new output

Delete

[ ] Have an existing Kafka Output (not default) and delete it
- Verify that the output is effectively deleted
[ ] Have an existing Default Kafka Output and delete it
- Verify that the delete operation fails with a 400

Note that in this context default refers tois_default flag.

@szwarckonrad

kevinlog commented 1 year ago

@jlind23

Apologies for the late reply on this one: https://github.com/elastic/kibana/issues/143324#issuecomment-1584110688

I spoke with @szwarckonrad offline - we think the API should be available for testing in the main branch in the first half of the 8.10 cycle. He has a draft PR up here: https://github.com/elastic/kibana/pull/159110

jlind23 commented 1 year ago

For tracking purpose i'll link the draft PR to this issue. Thanks @kevinlog for the update.

juliaElastic commented 1 year ago

@kevinlog Hey, QA Source team asked to do a demo with them when the UI feature is ready.

@amolnater-qasource By reading the comments the API work is planned for 8.10, I suppose the UI work after that.

amolnater-qasource commented 1 year ago

Thank you for the confirmation @juliaElastic

We will keep track of the updates on this feature. Further as the feature is not going in 8.9, we will put the test content work on hold for now.

Thank you!

kevinlog commented 1 year ago

Thanks @juliaElastic @amolnater-qasource - we can do a demo when the UI is ready. @szwarckonrad is currently working on the API with the UI to follow.

criamico commented 1 year ago

@cmacknz Can you clarify if a kafka output can have a shipper section or if it's not currently possible? @szwarckonrad is finalizing his PR and can still do changes to that part

cmacknz commented 1 year ago

The Kafka output shouldn't have to explicitly account for the shipper. We are working through simplifying how the shipper configuration works:

https://github.com/elastic/elastic-agent/pull/2728#issuecomment-1583178038

We will use shipper.enabled: true as the syntax for enabling the shipper, but put all other configuration under the root output configuration. We will keep the shipper.* syntax as an escape hatch in case we find an unforeseen configuration conflict and need to move configuration under the shipper path.

The only thing that will be under the shipper configuration object going forward is the enabled flag. Shipper development is temporarily paused so we haven't created issues to update the Fleet side of this yet.

szwarckonrad commented 1 year ago

@jlind23 @criamico @juliaElastic cc @kevinlog Screenshot 2023-06-23 at 14 57 37 Screenshot 2023-06-23 at 14 59 27

I just wanted to reach out regarding the UI part of the task. While working on it, I came across a question about the Topics and Headers sections. They both are custom multirow components and even though the Hosts section uses MultiRowInput component, it's designed to handle a single field. I was thinking about the best approach for implementing these new custom multi-row inputs, and I believe we have a couple of options to consider.

Firstly, we could consider refactoring the existing MultiRowInput to accommodate different types of children components. This way, it would be more versatile and meet the requirements for the problem at hand.

Alternatively, we could create two new custom components that are specifically tailored to fulfill all the requirements for the Topics and Headers sections.

What do you think would be the most suitable path forward?

jlind23 commented 1 year ago

@szwarckonrad sorry we missed your comment. @criamico could you please provide us guidance here? cc @juliaElastic

szwarckonrad commented 1 year ago

@jlind23 I have a draft PR up with complete UI, I went with creating two separate components for topics and headers. I believe we can move the discussion about them there ;) https://github.com/elastic/kibana/pull/160112

criamico commented 1 year ago

The API part of this work is completed, but It would be good to have some guidance about the way to handle the password field. I'm wondering if there is any security concern here, as I think this is the first time that we handle any password in Fleet. @joshdover @jlind23 @juliaElastic @szwarckonrad

jlind23 commented 1 year ago

@criamico don't we already have password in some integrations just like here ?

nchaulet commented 1 year ago

The API part of this work is completed, but It would be good to have some guidance about the way to handle the password field. I'm wondering if there is any security concern here, as I think this is the first time that we handle any password in Fleet. @joshdover @jlind23 @juliaElastic @szwarckonrad

That a good catch we probably want to use encrypted saved object for the password field like we do for the logstash client certificates for example

criamico commented 1 year ago

Thanks @jlind23 and @nchaulet, i didn't know we already had working examples of this. I think this is the way to go for the Kafka password then.

szwarckonrad commented 1 year ago

UI PR is open. Open questions:

On mocks there is this little arrow between topics fields. I couldn't find this icon in EUI Icons, please let me know whether I should request it added / use something else / use text.
Should the kafka option in the dropdown go behind a feature flag?
I believe password field needs changes? :)

jlind23 commented 1 year ago

Should the kafka option in the dropdown go behind a feature flag?

Adding it behind a feature and under a beta status would make more sense to me. @nimarezainia @juliaElastic thoughts?

criamico commented 1 year ago

Adding it behind a feature and under a beta status would make more sense to me.

+1, I think that this is what was requested at the beginning of this work

joshdover commented 1 year ago

Why do we need the feature flag exactly? I think that will just make it harder for users to find the feature. Agree on beta for now though 👍

jlind23 commented 1 year ago

Then let's go for beta without feature flag 👍🏼

andrewkroh commented 1 year ago

Where can I find the schema for the kafka output configuration that Fleet writes into the Agent policy?

My interest is in knowing the complete set of kafka settings that can be expected when receiving a policy (like what fields are required, optional, their data types, etc... basically looking for an openapi spec for the output section of the agent policy).

juliaElastic commented 1 year ago

@andrewkroh It's here https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/common/openapi/components/schemas/output_create_request_kafka.yaml

andrewkroh commented 1 year ago

Thanks @juliaElastic. So is the Fleet API request format exactly the same as what is written into the Agent policy outputs section? For example, if an API request comes in with ca_trusted_fingerprint: 79f956a0175 then the Agent policy would contain this or is there some translation?

outputs:
  default:
    type: kafka
    ca_trusted_fingerprint: 79f956a0175

juliaElastic commented 1 year ago

There is some translation, e.g. ca_trusted_fingerprint gets a prefix of ssl.: https://github.com/szwarckonrad/kibana/blob/main/x-pack/plugins/fleet/server/services/agent_policies/full_agent_policy.ts#L214

mjmbischoff commented 1 year ago

Perhaps a silly question, but is there any integration with the other end of the kafka? we have https://docs.elastic.co/integrations/kafka_log and if I gathered the tickets correctly we also have an opinionated default name for topics so it would be great if some option was given to deploy an agent that consumes the topics for which integrations are sending and uses the default datastreams / pipelines, to ease rollout

brian-mckinney commented 1 year ago

I have a question about topics and processors in general. It is unlikely that Endpoint will be able to support the full gamut of available processors, considering we do not have access to libbeat and would have to write all the parsing code from scratch in C++. Is there a minimum set of required processors that could maybe help us tone down the scope of what we will need to provide? cc: @nfritts @ferullo

ref: https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html

mjmbischoff commented 1 year ago

@brian-mckinney I'm unsure how this ties into this issue specifically, but processors are generally there to edge processing typically it's to drop traffic / reduce payload, or collect additional (local context) information (add_*_metadata processors + dns). With kafka in the middle:

Source -> shipper(agent/endpoint?) -> Kafka -> forwarder(agent/filebeat) -> Elasticsearch

you still have the option to use all processors at the forwarder, though add_*_metadata processors aren't useful there as it would record things on the shipper, regardless it should improve things over the the current available options. And of source there's ingest pipelines for everything that doesn't require edge processing.

I think it's a separate issue, but things like decode_* can be skipped and parse_aws_vpc_flowlog which seems like a poorly named decode variant.

joshdover commented 1 year ago

I have a question about topics and processors in general. It is unlikely that Endpoint will be able to support the full gamut of available processors, considering we do not have access to libbeat and would have to write all the parsing code from scratch in C++.

I fully agree that Endpoint should avoid re-implementing Beats processors. From my understanding, the Kafka output only makes use of the conditional processors for topic selection, which does pare down the list somewhat. But I wonder if we could ship the first version of this without support for dynamic topic selection at all and only support for the static topic field? @nimarezainia

If/when we do want to support dynamic topic selection, I think we could omit some conditions, like network or contains (covered by regexp).

joshdover commented 1 year ago

Perhaps a silly question, but is there any integration with the other end of the kafka? we have docs.elastic.co/integrations/kafka_log and if I gathered the tickets correctly we also have an opinionated default name for topics so it would be great if some option was given to deploy an agent that consumes the topics for which integrations are sending and uses the default datastreams / pipelines, to ease rollout

This is a good suggestion - but not considered at this point. It would make sense for our default and examples in docs to match up with what the Kafka input package expects.

brian-mckinney commented 1 year ago

I fully agree that Endpoint should avoid re-implementing Beats processors. From my understanding, the Kafka output only makes use of the conditional processors for topic selection, which does pare down the list somewhat. But I wonder if we could ship the first version of this without support for topic selection at all and only support for the static topic field? @nimarezainia

If/when we do want to support dynamic topic selection, I think we could omit some conditions, like network or contains (covered by regexp).

Thanks @joshdover. I'm very interested in the outcome of this discussion. Scoping out dynamic topic selection in the first version would definitely reduce the amount of effort and testing complexity (on Endpoint at least) for the first version.

nimarezainia commented 1 year ago

I fully agree that Endpoint should avoid re-implementing Beats processors. From my understanding, the Kafka output only makes use of the conditional processors for topic selection, which does pare down the list somewhat. But I wonder if we could ship the first version of this without support for topic selection at all and only support for the static topic field? @nimarezainia

If/when we do want to support dynamic topic selection, I think we could omit some conditions, like network or contains (covered by regexp).

Thanks @joshdover. I'm very interested in the outcome of this discussion. Scoping out dynamic topic selection in the first version would definitely reduce the amount of effort and testing complexity (on Endpoint at least) for the first version.

@joshdover & @brian-mckinney dynamic topic selection is an attractive aspect of this solution. I have had a few customers engaging on that. However given where we are and the fact that this will be a beta to begin with, I think it's fair to address this as a followup. I will communicate this to our Beta candidates when the time comes.

nimarezainia commented 1 year ago

Could someone please clarify what Authentication methods will be available in the first phase? (appreciate it thx)

joshdover commented 1 year ago

@szwarckonrad Could you clarify which options we ended up implementing the UI for this first phase?

@nimarezainia I think we're also limited by what Endpoint ends up supporting, which is still in progress. @brian-mckinney should be able to help clarify this.

szwarckonrad commented 1 year ago

@joshdover Following the mockups I went with UI for username/password and SSL

elastic / kibana

[Fleet] Implement Kafka output form UI #143324

Kafka output UI

Tasks

API

UI tasks

Open questions

Create

Edit

Not default -> default

Type Changing

Delete