[Fleet] Defining Output per integration

nimarezainia commented 2 years ago

There are many legitimate reasons why an operator may need/want to send data from integrations to different outputs within a policy. Some may even need to send datastream to different outputs. Currently we only allow an output to be defined on a per policy basis. In order to support this request the per policy output definition needs to be over-written by the output defined in the integration. Our config should support this already.

Use Cases:

1) As an operator, I need my security logs from an agent to be sent to one logstash where as informational logs to be sent to another logstash instance.

2) We operate multiple beats on a given system and would like to migrate to using Elastic Agent. For historical and operational reasons these beats are writing data to distinct outputs. Once we migrate over to using Agent, we would like to keep the upstream pipeline intact.

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

willemdh commented 1 year ago

There are many reasons there is a need for multiple agents on a host. One example, which is applicable to Elastic ecosystem itself, a customer typically needs to forward the Elasticsearch / Logstash / Kibana logs and metrics to a separate monitoring cluster. This is not possible in general, as there is already a set of agents running on this node to index system logs and metrics..

Elastic should really support multiple outputs per integration or provide a supported way to install and manage multiple identical agents on a system.

amitkanfer commented 1 year ago

@nimarezainia what else is needed from you on this one?

nimarezainia commented 1 year ago

@amitkanfer definition is fairly self explanatory but I need to create a mock up for the UI.

amitkanfer commented 1 year ago

Thanks @nimarezainia - once ready let's chat online and pass it to @jlind23 for development.

nicpenning commented 1 year ago

Another big reason for output per integration is when you have 20+ integrations in a specific policy, there is a good chance that some of those integrations have very different performance requirements.

The biggest need for this feature for me is having the ability to set the amount of works and bulk max size to account for a particular integration that ingest 30K events per second. We have some integrations that only receive 1-5 events per minute so it doesn't make sense to crank up the workers and bulk max size since not all integrations need that performance adjustment.

Here is a sample policy with their respective EPS and need for per integration output selection:

Firewall - 30K EPS -12 Workers, 2500 bulk max size
Windows Events - 3000 EPS -4 Workers, 500 bulk max size
HTTP Input - .5 EPS -Default
API - 20 EPS -Default
Web Logs - 300 EPS -Default

jlind23 commented 1 year ago

@nimarezainia What would be the user experience here? Shall we display per output/policy a list of integrations that users can check to see which one is using what output? Or shall we in the UI below offer the option to switch the output for each integration?

nimarezainia commented 1 year ago

@jlind23 I propose the following: (@zombieFox we need to discuss this also)

In the integrations settings page we need a drop down which would display the set of outputs available to the user (configured on the Fleet->settings tab). This should default to whatever output is configured in the policy for integration data. We may want to put this in the advanced settings drop down.

The agent policy page should be modified also to show a summary of what output is allocated to which integration:

nimarezainia commented 1 year ago

In scenarios where the user is needing to send different data streams to different outputs, the above model still works as the user can add two instances of the same integration to the policy. For example of the NGINX:

nginx-1 instance:

enable access logs datastream
disable error logs datastream
set integration to send data to output Logstash-A

nginx-2 instance:

disable access logs datastream
enable error logs datastream
set integration to send data to output Logstash-B

zombieFox commented 1 year ago

We reviewed this in the UX sync. Looks good to go. The additions indicated above don't require design mocks.

The copy sounds right to me too, but we might want to pass with by David Kilfoyle or Karen Metts.

jen-huang commented 1 year ago

Moving this to tech definition for this sprint, if the work identified is a small amount, we'll proceed with implementation.

nchaulet commented 1 year ago

Proposed technical implementation for that

I did a small POC implementing only the API part for it with some shortcuts PR to ensure it will work as expected and it seems it will

Package policy/saved object changes

We will introduce a new property named output_id to the package policy. This property will be added/updated in the following components:

Saved object
Type and schema for package policy preconfigured package policy and simplified package policy

We will need to validate that creating/editing a package policy output respect the same rules as per agent policy outputs

APM and fleet server package policies cannot use non ES output.
Licence restriction it should only be available for enterprise licence as multiple output correct ? @nimarezainia

Deleting/Editing output changes

We will have to implement the same rules as we have for agent policy:

When an output assigned to a package policy is deleted, the associated package policy will revert to using the default output
Furthermore, if an output is updated, we will increment the revision for package policies and agent policies utilizing it

Full agent policy generation changes (aka policy sent to agents)

We need to adapt the policy sent to the agent to reflect our model change, the agent already support this using the use_output property and already support multiple outputs.

I tested this locally with the POC PR it seems to work with multiple logs package policy and it seems to work as expected,

The use_output field as to be populated with the package policy output id or the default data output (code here)

https://github.com/elastic/kibana/blob/74509cdc33850043fafc4d74d6a971b59a01ca2a/x-pack/plugins/fleet/server/services/agent_policies/package_policies_to_agent_inputs.ts#L57

The role permission has to change so we generate a role permission for each output based on the package policy assigned to them instead of one for data and one for monitoring (code here)

https://github.com/elastic/kibana/blob/74509cdc33850043fafc4d74d6a971b59a01ca2a/x-pack/plugins/fleet/server/services/agent_policies/full_agent_policy.ts#L205-L226

Things to verify

[ ] Ensure that the changes are compatible with all input types. It has been tested with log inputs and seems functionnal cc @cmacknz
[ ] It seems if one package policy output is broken the input is still reported as healthy in the UI need verify and create a follow up elastic agent issue if it is the case.

nimarezainia commented 1 year ago

Package policy/saved object changes

We will introduce a new property named output_id to the package policy. This property will be added/updated in the following components:
* Saved object

* Type and schema for package policy preconfigured package policy and simplified package policy
We will need to validate that creating/editing a package policy output respect the same rules as per agent policy outputs
* APM and fleet server package policies cannot use non ES output.

* Licence restriction it should only be available for enterprise licence as multiple output correct ?  @nimarezainia

thanks @nchaulet - yes this is correct, same licensing restriction as we have for per policy output.

cmacknz commented 1 year ago

Ensure that the changes are compatible with all input types. It has been tested with log inputs and seems functionnal cc @cmacknz

We don't have any special handling for specific input types. The use_output option in the agent supports multiple outputs like this already. The only under the hood effect of multiple outputs is the possibility that the agent will run more processes than before. This will add additional queues increasing the memory usage of the agent.

For example, the following results in one logfile input process (or component in the agent model) named input-default implemented by Filebeat:

outputs:
  default:
     type: elasticsearch
     ...
inputs:
   - id: logfileA
     type: logfile
     use_output: default
     ...
   - id: logfileB
     type: logfile
     use_output: default
     ...

While the configuration below with two distinct outputs will result in two Filebeat processes/components, one named logfile-outputA and one named logfile-outputB:

outputs:
  outputA:
     type: elasticsearch
     ...
  outputB:
     type: elasticsearch
     ...
inputs:
   - id: logfileA
     type: logfile
     use_output: outputA
     ...
   - id: logfileB
     type: logfile
     use_output: outputB
     ...

You should be able to observe this directly in the output of elastic-agent status and in the set of components states reported to Fleet.

cmacknz commented 1 year ago

I should note that you only end up with additional processes when assigning inputs of the same type to different outputs. If in the example of above there was a system/metrics instead of logfileB there would be no change. This is because the agent runs instances of the same input type in the same process, and is already isolating different input types into their process.

jen-huang commented 1 year ago

Thanks @nchaulet, @nimarezainia, @cmacknz for the work & discussion here. Based on recent discussions about priority, I am going to kick this by a few sprints for implementation work.

BenB196 commented 11 months ago

One of the biggest drivers from our company's end on this would be APM Server, which can only support the Elasticsearch output. We mainly leverage Logstash output for agents. This requires us to run a second Agent for just APM server, and when you get to scale (100+ APM Server/Elastic Agent deployments across multiple Kubernetes clusters). We end up "wasting" 500MB on each node just operating the second agent for APM rather than being able to use our existing ones that default use Logstash.

Depending on how you look at it, 500MB might not seem like a lot, but when you're having to operate 50-100 deployments, that is 25GB-50GB of memory. This also indirectly generates additional monitoring data from the additional agents that we need to run and be monitored.

nimarezainia commented 11 months ago

One of the biggest drivers from our company's end on this would be APM Server, which can only support the Elasticsearch output. We mainly leverage Logstash output for agents. This requires us to run a second Agent for just APM server, and when you get to scale (100+ APM Server/Elastic Agent deployments across multiple Kubernetes clusters). We end up "wasting" 500MB on each node just operating the second agent for APM rather than being able to use our existing ones that default use Logstash.

Depending on how you look at it, 500MB might not seem like a lot, but when you're having to operate 50-100 deployments, that is 25GB-50GB of memory. This also indirectly generates additional monitoring data from the additional agents that we need to run and be monitored.

thanks @BenB196. How would you deploy the agent if you could indeed have the ability to define output per integration?

BenB196 commented 11 months ago

Hi @nchaulet currently for each Kubernetes cluster we deploy 2 DaemonSets, one that uses Logstash output and contains all normal integrations, a second which uses the Elasticsearch output and contains just APM Server. If per integration output was supported, we'd switch to deploying a single DaemonSet which uses Logstash as the default, and specifies the Elasticsearch output solely for the APM Server integration.

nicpenning commented 9 months ago

👋 just checking in on this feature! Any progress or details needed to further get thus implemented?

8.12 added the remote Elasticsearch output which was significant! The ability to do this per integration would be very beneficial as reasons previously stated. Thank you!

nimarezainia commented 9 months ago

thanks @nicpenning this is still prioritized but we have other higher impacting issues to resolve. We should get to this one soon as well.

nicpenning commented 9 months ago

Thank you for the update, Nima!

jlind23 commented 7 months ago

@nimarezainia I might have missed them but do we have any UI/UX mockup for this?

nimarezainia commented 7 months ago

@nimarezainia I might have missed them but do we have any UI/UX mockup for this?

https://github.com/elastic/kibana/issues/143905#issuecomment-1780390156

kpollich commented 5 months ago

Want to bump Nicolas's comment above with the necessary implementation plan as this is coming up soon in our roadmap: https://github.com/elastic/kibana/issues/143905#issuecomment-1796120737

karnamonkster commented 5 months ago

Really need this one to see running in our cluster. As we have made a stupid(not yet) but brave decision to move to a unified agent that would be used for all Infra, security and application specific data logging for different teams with different ECE instances as consumers. From a data quality perspective, governance of ECS compliance led to this decision. We cannot have anyone sending same data over in different ways. Of course there are exceptions but we still aim to keep it at minimum. A sincere request to expedite this enhancement/feature request.

mbudge commented 4 months ago

Also need this so we can send system metrics (to avoid the logstash 403 forbidden infinite retry issue crashing logstash) , and firewall security logs and Netflow to different logstash pipeline inputs as they are higher throughput which we don’t want impacting windows security log collection.

nimarezainia commented 4 months ago

We will soon have news for you all on this issue with the targeted release. Thanks for your patience.

supu2 commented 4 months ago

@nimarezainia Is there any ETA for the release? Which release we will get that feature? Thanks you so much for that integration.

nimarezainia commented 3 months ago

@nimarezainia Is there any ETA for the release? Which release we will get that feature? Thanks you so much for that integration.

If our testing completes successfully target is 8.16

amolnater-qasource commented 2 months ago

Hi Team,

We have created 07 testcases under Testmo for this feature under Fleet test suite under below Section:

Output per integration

Please let us know if any other scenario needs to be added from our end.

Thanks!

amolnater-qasource commented 4 weeks ago

Hi Team,

We have executed 07 testcases under the Feature test run for the 8.16.0 release at the link:

Output per integration

Status:

PASS: 07

Build details: VERSION: 8.16.0 BC2 BUILD: 79434 COMMIT: 59220e984f2e3ca8b99fe904d077a5979f5f298d

As the testing is completed on this feature, we are marking this as QA:Validated.

Please let us know if anything else is required from our end. Thanks

elastic / kibana