[Integrations] Option to disable running integration per agent.

elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.

Other

23 stars 145 forks source link

[Integrations] Option to disable running integration per agent. #5813

Open s-karberg opened 1 month ago

s-karberg commented 1 month ago

Describe the enhancement: Disable / Turn off integration per agent.

Describe a specific use case for the enhancement or feature: I got into a case where 1 integration on a running agent started collecting/forwarding logs that couldn't be ingested and ended in Dead Letter Queue in Logstash. It took a little bit of time to figure out this ingest problem, but while it was happening it would be really great to be able to disable/enable a integration on a agent while debugging.

What is the definition of done? The way I see this could be integration was you expand the intended integration on a agent. Under Inputs have some kind of option to disable it, e.g. on the colored status icon, click on it to disable it. The text below where it's currently saying Healthy it could e.g. say Healthy but disabled

jlind23 commented 1 month ago

@nimarezainia is this a request/suggestion you ever got?

nimarezainia commented 1 month ago

@s-karberg I would suggest moving the agent to another dedicated policy for debugging, there you can experiment with multiple agents that may have the same problem. Also you will be able to change the logging level at the policy level that would only affect this small subset of agents.

s-karberg commented 1 month ago

@nimarezainia that makes sense in a way. But what if you have e.g. 10 integrations in that policy ? That could make "noise" from that agent, isolating the integrations by stopping/starting them could help debug faster.

nimarezainia commented 1 month ago

The problem is that the whole concept of the policy is that its a grouping of the configuration - i.e same config is sent to the group of agents controlled by that policy.

You can duplicate the policy if you have a lot of integrations. You can utilize Reusable integrations policy to also copy over your integration to other policies, if integration sprawl is your concern.

Most of our user who I have come across are using this approach for agent isolation and debugging.

s-karberg commented 1 month ago

hmm, then I would need e.g. X "debugging" policies because of being an MSSP. Data is isolated to e.g. individual spaces in kibana with dataviews.

That would just increase the count of policies per fleet server then it could mean to get closer to the max https://www.elastic.co/guide/en/fleet/current/agent-policy.html#agent-policy-scale

Can I maybe hear why this idea would be bad?

nimarezainia commented 1 month ago

Mainly because we end up with policies that have a whole lot of exceptions in them. Not to mention the complexity we have to develop in the UI, for agent x in policy y disable integration z. We have users with close to 100k agents spread across policies. More importantly we will now have a system where agent policy is not the source of truth for configuration and it's changes for each agent. It would be a lot for Fleet to administer.

Perhaps I misunderstood your use case; this is just for debugging/troubleshooting correct? you can delete the policy once debugging is complete could you not?

Is using a standalone agent a consideration for you? there you can control the config for each individual agent.

BenB196 commented 1 month ago

A pseudo way of achieving this (but isn't currently supported across all integrations - for fleet managed) is to leverage conditions: https://www.elastic.co/guide/en/fleet/current/dynamic-input-configuration.html. You can simply add a condition that would either include or exclude a set of hosts depending on the requirements.