Closed nimarezainia closed 3 months ago
Pinging @elastic/fleet (Team:Fleet)
There are many reasons there is a need for multiple agents on a host. One example, which is applicable to Elastic ecosystem itself, a customer typically needs to forward the Elasticsearch / Logstash / Kibana logs and metrics to a separate monitoring cluster. This is not possible in general, as there is already a set of agents running on this node to index system logs and metrics..
Elastic should really support multiple outputs per integration or provide a supported way to install and manage multiple identical agents on a system.
@nimarezainia what else is needed from you on this one?
@amitkanfer definition is fairly self explanatory but I need to create a mock up for the UI.
Thanks @nimarezainia - once ready let's chat online and pass it to @jlind23 for development.
Another big reason for output per integration is when you have 20+ integrations in a specific policy, there is a good chance that some of those integrations have very different performance requirements.
The biggest need for this feature for me is having the ability to set the amount of works and bulk max size to account for a particular integration that ingest 30K events per second. We have some integrations that only receive 1-5 events per minute so it doesn't make sense to crank up the workers and bulk max size since not all integrations need that performance adjustment.
Here is a sample policy with their respective EPS and need for per integration output selection:
@nimarezainia What would be the user experience here? Shall we display per output/policy a list of integrations that users can check to see which one is using what output? Or shall we in the UI below offer the option to switch the output for each integration?
@jlind23 I propose the following: (@zombieFox we need to discuss this also)
In the integrations settings page we need a drop down which would display the set of outputs available to the user (configured on the Fleet->settings tab). This should default to whatever output is configured in the policy for integration data. We may want to put this in the advanced settings drop down.
The agent policy page should be modified also to show a summary of what output is allocated to which integration:
In scenarios where the user is needing to send different data streams to different outputs, the above model still works as the user can add two instances of the same integration to the policy. For example of the NGINX:
nginx-1 instance:
nginx-2 instance:
We reviewed this in the UX sync. Looks good to go. The additions indicated above don't require design mocks.
The copy sounds right to me too, but we might want to pass with by David Kilfoyle or Karen Metts.
Moving this to tech definition for this sprint, if the work identified is a small amount, we'll proceed with implementation.
Proposed technical implementation for that
I did a small POC implementing only the API part for it with some shortcuts PR to ensure it will work as expected and it seems it will
We will introduce a new property named output_id
to the package policy. This property will be added/updated in the following components:
We will need to validate that creating/editing a package policy output respect the same rules as per agent policy outputs
We will have to implement the same rules as we have for agent policy:
We need to adapt the policy sent to the agent to reflect our model change, the agent already support this using the use_output
property and already support multiple outputs.
I tested this locally with the POC PR it seems to work with multiple logs package policy and it seems to work as expected,
The use_output field as to be populated with the package policy output id or the default data output (code here)
The role permission has to change so we generate a role permission for each output based on the package policy assigned to them instead of one for data and one for monitoring (code here)
Package policy/saved object changes
We will introduce a new property named
output_id
to the package policy. This property will be added/updated in the following components:* Saved object * Type and schema for package policy preconfigured package policy and simplified package policy
We will need to validate that creating/editing a package policy output respect the same rules as per agent policy outputs
* APM and fleet server package policies cannot use non ES output. * Licence restriction it should only be available for enterprise licence as multiple output correct ? @nimarezainia
thanks @nchaulet - yes this is correct, same licensing restriction as we have for per policy output.
Ensure that the changes are compatible with all input types. It has been tested with log inputs and seems functionnal cc @cmacknz
We don't have any special handling for specific input types. The use_output
option in the agent supports multiple outputs like this already. The only under the hood effect of multiple outputs is the possibility that the agent will run more processes than before. This will add additional queues increasing the memory usage of the agent.
For example, the following results in one logfile input process (or component in the agent model) named input-default
implemented by Filebeat:
outputs:
default:
type: elasticsearch
...
inputs:
- id: logfileA
type: logfile
use_output: default
...
- id: logfileB
type: logfile
use_output: default
...
While the configuration below with two distinct outputs will result in two Filebeat processes/components, one named logfile-outputA
and one named logfile-outputB
:
outputs:
outputA:
type: elasticsearch
...
outputB:
type: elasticsearch
...
inputs:
- id: logfileA
type: logfile
use_output: outputA
...
- id: logfileB
type: logfile
use_output: outputB
...
You should be able to observe this directly in the output of elastic-agent status
and in the set of components states reported to Fleet.
I should note that you only end up with additional processes when assigning inputs of the same type to different outputs. If in the example of above there was a system/metrics
instead of logfileB
there would be no change. This is because the agent runs instances of the same input type in the same process, and is already isolating different input types into their process.
Thanks @nchaulet, @nimarezainia, @cmacknz for the work & discussion here. Based on recent discussions about priority, I am going to kick this by a few sprints for implementation work.
One of the biggest drivers from our company's end on this would be APM Server, which can only support the Elasticsearch output. We mainly leverage Logstash output for agents. This requires us to run a second Agent for just APM server, and when you get to scale (100+ APM Server/Elastic Agent deployments across multiple Kubernetes clusters). We end up "wasting" 500MB on each node just operating the second agent for APM rather than being able to use our existing ones that default use Logstash.
Depending on how you look at it, 500MB might not seem like a lot, but when you're having to operate 50-100 deployments, that is 25GB-50GB of memory. This also indirectly generates additional monitoring data from the additional agents that we need to run and be monitored.
One of the biggest drivers from our company's end on this would be APM Server, which can only support the Elasticsearch output. We mainly leverage Logstash output for agents. This requires us to run a second Agent for just APM server, and when you get to scale (100+ APM Server/Elastic Agent deployments across multiple Kubernetes clusters). We end up "wasting" 500MB on each node just operating the second agent for APM rather than being able to use our existing ones that default use Logstash.
Depending on how you look at it, 500MB might not seem like a lot, but when you're having to operate 50-100 deployments, that is 25GB-50GB of memory. This also indirectly generates additional monitoring data from the additional agents that we need to run and be monitored.
thanks @BenB196. How would you deploy the agent if you could indeed have the ability to define output per integration?
Hi @nchaulet currently for each Kubernetes cluster we deploy 2 DaemonSets, one that uses Logstash output and contains all normal integrations, a second which uses the Elasticsearch output and contains just APM Server. If per integration output was supported, we'd switch to deploying a single DaemonSet which uses Logstash as the default, and specifies the Elasticsearch output solely for the APM Server integration.
👋 just checking in on this feature! Any progress or details needed to further get thus implemented?
8.12 added the remote Elasticsearch output which was significant! The ability to do this per integration would be very beneficial as reasons previously stated. Thank you!
thanks @nicpenning this is still prioritized but we have other higher impacting issues to resolve. We should get to this one soon as well.
Thank you for the update, Nima!
@nimarezainia I might have missed them but do we have any UI/UX mockup for this?
@nimarezainia I might have missed them but do we have any UI/UX mockup for this?
https://github.com/elastic/kibana/issues/143905#issuecomment-1780390156
Want to bump Nicolas's comment above with the necessary implementation plan as this is coming up soon in our roadmap: https://github.com/elastic/kibana/issues/143905#issuecomment-1796120737
Really need this one to see running in our cluster. As we have made a stupid(not yet) but brave decision to move to a unified agent that would be used for all Infra, security and application specific data logging for different teams with different ECE instances as consumers. From a data quality perspective, governance of ECS compliance led to this decision. We cannot have anyone sending same data over in different ways. Of course there are exceptions but we still aim to keep it at minimum. A sincere request to expedite this enhancement/feature request.
Also need this so we can send system metrics (to avoid the logstash 403 forbidden infinite retry issue crashing logstash) , and firewall security logs and Netflow to different logstash pipeline inputs as they are higher throughput which we don’t want impacting windows security log collection.
We will soon have news for you all on this issue with the targeted release. Thanks for your patience.
@nimarezainia Is there any ETA for the release? Which release we will get that feature? Thanks you so much for that integration.
@nimarezainia Is there any ETA for the release? Which release we will get that feature? Thanks you so much for that integration.
If our testing completes successfully target is 8.16
Hi Team,
We have created 07 testcases under Testmo for this feature under Fleet test suite under below Section:
Please let us know if any other scenario needs to be added from our end.
Thanks!
Hi Team,
We have executed 07 testcases under the Feature test run for the 8.16.0 release at the link:
Status:
PASS: 07
Build details: VERSION: 8.16.0 BC2 BUILD: 79434 COMMIT: 59220e984f2e3ca8b99fe904d077a5979f5f298d
As the testing is completed on this feature, we are marking this as QA:Validated.
Please let us know if anything else is required from our end. Thanks
There are many legitimate reasons why an operator may need/want to send data from integrations to different outputs within a policy. Some may even need to send datastream to different outputs. Currently we only allow an output to be defined on a per policy basis. In order to support this request the per policy output definition needs to be over-written by the output defined in the integration. Our config should support this already.
Use Cases:
1) As an operator, I need my security logs from an agent to be sent to one logstash where as informational logs to be sent to another logstash instance.
2) We operate multiple beats on a given system and would like to migrate to using Elastic Agent. For historical and operational reasons these beats are writing data to distinct outputs. Once we migrate over to using Agent, we would like to keep the upstream pipeline intact.