Closed nchaulet closed 4 months ago
Seems like the logging level in the policy is ignored in the policy change handler, we always persist the log level we started with.
We only support changing the log level through the SETTINGS action but I have no idea why we did it this way.
But agent debug logs can still be turned out with the Ui change and not the api call Nicolas mentioned an I correct?
Yes, this is unintuitive but you can still change the log level from the UI with the SETTINGS action.
So at least we have a workaround for this..
@cmacknz I had a look at the code and I guess that we ignore the log level in the handler_policy_change_action due to an issue with deserialization:
the log level is defined as an int8
within agent where 0
corresponds to info
level and we use a map to convert the string we receive in the policy: when unmarshaling a policy that does not contain the log level (all of them unless we use the new fleet entry point), we are probably going to get and use the zero value (an empty string) which will likely throw an error (to be tested).
To solve this issue we have to optionally unpack the log level only when the key is specified and fall back on the configured value if it's not. Another possible solution would be to transform the Level into a *Level
and then add the current log level value if the specified Level is nil
(probably a bit cleaner) but we can't change it easily as it's part of elastic-agent-libs... :disappointed:
but we can't change it easily as it's part of elastic-agent-libs...
The two largest users of elastic-agent-libs are Agent and Beats (and things that implement a Beat outside of the Beats repository). As long as the change wouldn't be breaking for Beats we could do this if it makes more sense, and if it is breaking we could just make it an option or variant of the existing behavior.
bringing a bit of a history into this. the proposed change should not work in a way how agent is designed. i'm seeing this as a feature request not a bug.
the idea behind POLICY and SETTING was this POLICY contains information about Inputs, Output and Fleet connection info SETTINGS is agent specific setting
when we start thinking about supporting setting Log Level using settings I'd like us to stop for a while and think about how this should behave first.
At this time it' simple, default log level is info, we persist it as a setting. Once there's a need to switch to debug you will switch to debug and we persist it as a configuration option, this overrides default one. Once you're done you'll go back to Info.
Once you introduce setting log level using POLICY at the start we end up with a pair info:nil
meaning default using settings is info
, and it is not set using policy
then you change policy to use warn
ending up with info:warn
the question is which one has priority, we dont have information about the source of the first one.
Even if we consider not setting info at start, ending up with nil:warn
. We will use warn.
Now setting from Debug SETTINGS will come, we have debug:warn
using debug
with a higher priority as it is per agent config.
The question is, how we reset per agent config. if we set whatever, we will end up with whatever:warn
with whatever always taking priority even with future Policy changes.
When I take a look at diagnostics I should be deterministically say what's the log level based on Policy and Settings without worrying about order.
From my point of view we should either
nil:whatever
Getting rid of SETTINGs does not make sense. In the need of debug, changing log level to Debug for potentially 100k agent would create a spike in processing requirement, memory and cost eventually.
@cmacknz offline this week, pinging @joshdover: do we have thought through use cases for this?
Thanks for the history, @michalpristas. Makes sense why this is behaving this way.
I think this is somewhat clear already, but just to be concise the use cases we have are:
IMO the precedence order should be:
In general we should track which agents have a one-off setting applied via the SETTINGS action and should flag those to the user in the UI. This is essentially "drift detection" for agents that are deviating from their policy, and the user should be encouraged to reset those agents back to the policy.
add Inherit log level or Reset function on Fleet UI which would result in pair
nil:whatever
Yeah I agree with this, there needs to be some way to reset (3) and start using (2) again.
So in conclusion, I agree this is not a bug, but a new feature and use case: controlling the log level via the agent policy. We've seen this request before and it should be prioritized, but as an enhancement and not a bug.
@pierrehilbert If you agree, let's deprioritize this one for now.
Thanks everyone for your insights and I agree with the precedence order mentioned by @joshdover. I changed the issue type and priority according to this new agreement. Still keeping this in the current sprint for now but with a lower priority compare to other items and will see if we need some other adjustments.
keep in mind that in order for this to work properly we need reset fucntionality or Policy log level in fleet in place. Without this once you set log level using settings all log levels coming from Policy will be ignored.
from planning perspective it does not make sense to work on this until fleet part is at least scheduled for development
Agreed, we will need support for this in Fleet. It's likely a small enhancement, but needs to be planned for.
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
Issue
We now have a feature that allow to add arbitrary key to the agent policy sent to the agent, https://github.com/elastic/kibana/pull/159414#issuecomment-1587160242
I tried to set agent log level this way to Fleet managed agents, and it seems the agent is still logging at the default log level, when I run
elastic-agent inspect
I saw that myagent.logging.level: debug
config is here, but it seems not used by the agentDefinition of done