Open joshdover opened 1 year ago
https://github.com/elastic/elastic-agent/issues/2612 has been merged; moving to 8.12-candidate
to fit timelines.
It seems beats is already using the agent control protocol under the hood (https://github.com/elastic/apm-server/pull/12508#discussion_r1469817098) so we can't add an additional client.
We can either stop using the beats manager and implement the feature in apm-server or try to work with beats (I'm not sure if this is possible since beats is hardcoding the values passed to the control protocol).
Another issue is that some of the libraries we use hardcode the tracer (e.g. go-docappender) so we can only stop and recreate the indexer or implement an option to retrieve the indexer dynamically.
Any feedback, opinion or alternative solution ?
We can either stop using the beats manager and implement the feature in apm-server or try to work with beats (I'm not sure if this is possible since beats is hardcoding the values passed to the control protocol).
Could we update libbeat to consume the TriggeredAPMChange
, translate it to YAML settings, and merge that into a config change and update the beat config?
Another issue is that some of the libraries we use hardcode the tracer (e.g. go-docappender) so we can only stop and recreate the indexer or implement an option to retrieve the indexer dynamically.
We already hot-reload everything on config changes, so if we do the above then we can construct a new tracer with the new config.
@joshdover @cmacknz does my suggest above (to consume TriggeredAPMChange, update the beat config, and trigger a reload) sound sensible? How do you intend to deal with TriggeredAPMChange for Beats? I think this should probably be solved in libbeat, and then apm-server can consume updates from there. Is that something the control plane team can take on?
Taking a closer look at this, the APM configuration is being received in libbeat but it is not exposed in any way.
The APMConfig we need is attached to each unit that may have changed already as part of the expected state, we just don't do anything with it.
// Expected contains the expected state, log level, features and config for a unit.
type Expected struct {
Config *proto.UnitExpectedConfig
Features *proto.Features
LogLevel UnitLogLevel
State UnitState
APMConfig *proto.APMConfig
}
Here it is in our management code at the point where we reload the Beat configurations. https://github.com/elastic/beats/blob/7546ae1ceefa19c9a4886ab012fa661eaeaccf4b/x-pack/libbeat/management/managerV2.go#L568-L569
for _, unit := range units {
expected := unit.Expected()
The question I have is what we should do with the APM configuration at this point? In what form can APM server actually use it? Our agent client is quite isolated from the rest of Beats, so at the point where we receive the APM configuration we could change a Beat global tracing configuration or add a way to watch changes to it externally.
With respect to TriggeredAPMChange
the intent is to allow us to tell when the APM configuration has actually changed when we receive a new configuration from agent. This is something we would want to be looking at in the block below we wanted to build an API for externally observing changes to the tracing configuration outside of the management client:
switch change.Type {
// Within the context of how we send config to beats, I'm not sure if there is a difference between
// A unit add and a unit change, since either way we can't do much more than call the reloader
case client.UnitChangedAdded:
cm.addUnit(change.Unit)
// reset can be called here because `<-t.C` is handled in the same select
t.Reset(cm.changeDebounce)
case client.UnitChangedModified:
cm.modifyUnit(change.Unit)
// reset can be called here because `<-t.C` is handled in the same select
t.Reset(cm.changeDebounce)
This Beat management code was originally written before the change triggers existed so it doesn't make use of them yet.
The question I have is what we should do with the APM configuration at this point? In what form can APM server actually use it? Our agent client is quite isolated from the rest of Beats, so at the point where we receive the APM configuration we could change a Beat global tracing configuration or add a way to watch changes to it externally.
What do you think about introducing a new libbeat/common/reload
registry key (say "apm"), and calling any registered reloader(s) when TriggeredAPMChange
is received? We would need to marshal it to a config.C
, and I think we could just use the original Elastic Agent config YAML structure for that.
In APM Server we would update https://github.com/elastic/apm-server/blob/17969f98f0c54727d8c104af0d03f40b32f43369/internal/beatcmd/reloader.go#L68-L73 to also watch for APM config changes, and hot-reload the process like we already do for input & output config changes.
Moving this out of 8.13
scope due to the uncovered issues. @cmacknz is it reasonable for the EA team to make the required changes (once there is agreement on a path forward) early enough in 8.14
dev cycle so that we can ensure this works gets into apm-server 8.14
?
Hooking into the existing libbeat configuration reload API makes sense to me as a way to expose this.
Our team could do this, answering when we could do it I'll have to leave with @pierrehilbert.
@pierrehilbert could you please provide an update when this could be done by the Elastic Agent team? We are eager of enabling apm tracing when in managed mode.
@pierrehilbert any update on this please? We are really eager to finally get this in.
Really sorry @simitt I missed the first mentions. @pchila will probably be the best candidate to spend time on this as he already spend some time on the APM topic. What would be the ideal realistic target to have this done? cc @ycombinator as @pchila will be in his team with the new org on Agent side.
What would be the ideal realistic target to have this done?
Not certain if the question is directed towards @pchila or me, but from our end, the sooner the better. If you can aim for providing functionality in 8.15
, so we can do the follow up in 8.16
, it would be good.
Created https://github.com/elastic/beats/issues/39606 to summarize the discussion between @cmacknz and @axw and track this request in the Ingest team's roadmap.
I have PR https://github.com/elastic/beats/pull/40030 up for review. You can start with that PR as a base until it is merged, to implement the feature you need in APM server. If you have any questions just let me know and I can help.
I've done the manual testing by running the APM Server in managed mode with locally built Elastic Agent image and left the result in the comment.
@kyungeunni please ensure with your testing that the self instrumentation can be used in the ESS setup. That is the main use case for us to be able to leverage apm self instrumentation when troubleshooting SDHs.
I'll test this in ESS.
Just sharing some notes:
Fleet-server uses the same APMConfig and seems to have the tracing enabled on their side, not relying on instrumentation
package but having their own. Here's the link to the doc for the APM Tracing.
The configs used to make the tracing work on fleet-server
are as follows:
In the current state, we will be able to send the tracing based on what I've tested locally, but we won't be able to set the Global Labels so we need extra work on this part.
I see three options to support global labels:
instrumentation
package from beats (needs to be verified if it works or no)1 is quick and short, but it's not declarative and a bit hacky. if we choose to go with 2 or 3, we will be able to parse and support more config values coming from APMConfig (e.g. TLS)
1 is quick and short, but it's not declarative and a bit hacky. if we choose to go with 2 or 3, we will be able to parse and support more config values coming from APMConfig (e.g. TLS)
If (3) is feasible without significantly more effort, then I would be in favour of that, to reduce our dependency on libbeat.
Required follow up tasks to make this fully work are created and assigned:
Validated as a part of https://github.com/elastic/apm-server/issues/13604
In order to enable APM tracing when APM Server is managed by Elastic Agent, APM Server needs to apply APM configuration settings when they're passed over the control protocol. This is necessary to enable tracing of APM Server in ESS and ECE.
These settings should be used to enable APM Server's self tracing functionality, and send the traces to the configured APM Server passed via the control protocol This may require that we change how the
apm.Tracer
object is created and passed to internal objects as the config may change during runtime.References: