Closed dmathieu closed 1 year ago
Looks good, as it just provides an extra fallback service name value the impact should be quite minimal. I'll add to the next agents weekly meeting to get more feedback.
I also think this is fine.
One thing I'd mention is that we set serviceTargetName
here to something generic (in this case it'll be the endpoint based on network attributes). serviceTargetName
drives our service map and we identify dependencies based on that. This means if 2 (or more) messaging systems run on the same endpoint and none of them have messaging.destination
, then they'll be merged into a single dependency - and we'd use netName
in the name of this single dependency.
But I don't see this as a blocker for this PR, since the dependency detection wasn't better prior to this PR either and this is just a fallback - ideally we have messaging.destination
or messaging.url
.
When trying to implement this change in the Java agent in https://github.com/elastic/apm-agent-java/pull/3129, I found there was some aspects that I completely overlooked when reviewing this PR.
In the gherkin specs, we have a few strategies for values in target_service_name
, and destination resource:
The strategy for DB and messaging spans is inherited from the way the span destination resource was initially implemented, which could explain why the network level fallback are omitted.
So, the change in the gherkin spec for messaging seems a bit inconsistent because we mix the two approaches here.
If we want to go in this direction, then maybe it would make sense to also do it for other protocols (which might definitely be tricky to do). Also, maybe covering the 1st and 4th cases where only messaging.url
or net.peer.ip
is provided would be relevant.
The initial idea for target.service.name
and service.target.type
would be to replace destination.resource
, but the latter was kept for compatibility reasons. For example, for DB spans having service.target.{type,name}
allows to reconstruct an equivalent destination.resource
(the UI map and service dependencies might still rely on this).
In an ideal situation, we should not have to deal with such mapping in the agents, but we have a few agent features that rely on resource/service target type,name. I wonder if we could find a way to make it rely only on "set of otel field names" rather than having such tedious specification.
@dmathieu what do you think we can do to improve this ?
I agree that in an ideal situation, we shouldn't have to do this kind of mapping. The ingest (or UI) should know about semantic conventions (/ECS), and derive data from those appropriate fields.
I'm not sure I get the full context of this issue. The main reason for this change here was that netName was set, but never used for messaging, so it made sense to use it the same way as the spec specifies for HTTP spans. If that made more sense, we could also entirely remove the notion of netName in messaging spans.
Yes, I think it would make more sense (and be simpler) to leave netName
outside of the messaging spans for now,
let's discuss this in next agents meeting.
@dmathieu During last agents weekly meeting, I did not manage to explain why I think we should probably be better off keeping this as-is (as it was before this PR), so here are my thoughts on this change. Also, we could add a special case for otel bridge as a compromise, however that would further complicate the destination.resource
field specification as it is expected to be derived from service.target.type
and service.target.name
.
service.target.type
and service.target.name
in line with destination.service.resource
for compatibility with UI and data sent by existing agents.destination.service.resource
is defined in https://github.com/elastic/apm/blob/main/specs/agents/tracing-spans-destination.md#destination-resource and the inference algorithm currently only has an exception for external
spans.
messaging
spans would also impact our own messaging instrumentation (not just OTel)destination.service.resource
heuristic, and it would require at least to update the gherkin spec that was changed in this PR.service.target.{type,name}
but it's still required for now.If you have ideas how to better handle this, I am definitely interested. Ideally we should be able to make the agents and apm-server mapping easier to maintain and independent from UI, but that's not really something we can tackle right now.
I'm not sure the template applies, as this spec change is more of a clarification, and agents would either not do anything, or have already handle that lack of information.
When handling messaging traces, if there is no
messaging.destination
, we may usemessaging.url
, ornetName
as the target name instead. We previously had to setnetName
, but it was never used. So this change uses the set value.cc @SylvainJuge
CODEOWNERS
)/
schedule YYYY-MM-DD
to the PR description.sanitize_field_names
)CODEOWNERS
)/
schedule YYYY-MM-DD
to the PR description.