Open smith opened 3 months ago
This issue doesn't have a Team:<team>
label.
This discussion has been going on for years now.
host.name
needs to be normalized and lowercased, exactly for correlation reasons. There are so many data sources each logging with their own naming conventions. Also if we for example do a reverse dns look up of an ip, it's always lowercase fqdn.
APM should also normalize and lowercase host.name
if there is an issues there.
Original host name should be in host.hostname
@MikePaquette ;)
Thanks @willemdh
@smith - should we talk to the APM agents team?
@smith - should we talk to the APM agents team?
@roshan-elastic I think so. If we're normalizing data we need to do it for all methods of ingest.
By APM agent spec (https://github.com/elastic/apm/blob/main/specs/agents/metadata.md#hostname) APM agents should be lowercasing the value they send to APM server (metadata.system.detected_hostname
).
This was added to our specs about 9mo ago in https://github.com/elastic/apm/pull/805
THis issue https://github.com/elastic/apm/issues/794 has links to the implementation issues for each of the APM agents. That issue is "closed" for all but the Go APM Agent. We'd have to do some digging to see what version of each APM agent got this change and possible confirm that they are indeed lowercasing.
Do we have any info on which particular APM agents we are talking about here?
Previous discussion(s):
Other possible wrinkles:
ELASTIC_APM_HOSTNAME
envvar set, then that value is given to APM server as metadata.system.configured_hostname
. I'm not exactly sure what ends up as host.name
in APM server's processing after that.host.name
from this APM server code: https://github.com/elastic/apm-data/blob/a6842965268a11075f47991623aca2f4beb52004/model/modelprocessor/hostname.go#L39-L58Thanks for this @trentm - it sounds like our intent is to lower-case host.name
collected via APM agent so anything which isn't doing this is either:
Do we have any info on which particular APM agents we are talking about here?
@smith is this something you or someone in the team can share? I'm only really worried if it's something that isn't going to be addressed eventually.
Do we have any info on which particular APM agents we are talking about here?
I asked on the originating issue: https://github.com/elastic/kibana/issues/178650#issuecomment-2036675567 Caue said it was the Go APM Agent, so that makes sense.
I'm only really worried if it's something that isn't going to be addressed eventually.
Development focus for the Go Agent is on the OTel side, so I'm not sure how timely any change would be here.
Also I gather we'll have the same issue with OTel APM agents, where the host.name
spec differs from the suggestions in ECS's host.name
spec. OTel doesn't say anything about normalizing case.
Development focus for the Go Agent is on the OTel side, so I'm not sure how timely any change would be here.
That's OK - the main thing is that we're aligned on how to solve it (we can sort 'when' via prioritisation etc).
OTel doesn't say anything about normalizing case.
Great catch.
@AlexanderWert / @mlunadia / @tommyers-elastic - Do you think we can enforce standardisation for OTel data? This issue is showing the pitfalls of mixing cases etc - it leads to dup data/confusing user experiences.
Note : This issue is specifically focusing on lower-casing host.name
All of this is a result of this change in ECS (~a year ago): https://github.com/elastic/ecs/pull/2122
So, now we have a mix of old collectors (that not necessarily do lowercasing) and newer collectors (that do lowercasing).
In OpenTelemetry SemanticConventions host.name
is not being lowercased (and we can assume that we won't be able to change that): https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/
I think, the actual problem is that we use host.name
to correlate data and use it as an identifier of the host.
Actually, we should use host.id
for correlation and identification, because that one is meant to be unique and reliable in both, ECS and SemConv. host.name
should be rather used as a display name.
--> I really hope that with Assets / Entities these kind of things will be resolved!
ECS:
OTel SemConv:
Using host.id sounds good to me. For the current APM agents, it was only very recently added to APM agent specs. Only the Java APM agent will be producing host.id
currently. As well, APM server's intakev2 API (used by the APM agents) does not yet handle host.id
from APM agents. That's hopefully being added for 8.14.
actual problem is that we use host.name to correlate data and use it as an identifier of the host. Actually, we should use host.id for correlation and identification, because that one is meant to be unique and reliable in both, ECS and SemConv. host.name should be rather used as a display name.
That's a great point @AlexanderWert. I think that sounds sensible but I'm worried about what % of our customers will be able to supply this with current collection - especially as we want to leverage the host identifier across metricbeat, filebeat and the elastic agent integrations (and OTel).
Looking at one of our own clusters (us-east-1-logging...
) internal collection for different agents, host.id
looks pretty scarcely populated (e.g. 2-5% for filebeat) so I don't think that's feasible in the short-/medium-term from what I can see?
Filebeat - 2-5% have host.id
Metricbeat - around the same
It's a similar story on overview-....kb.us-west2
.
I believe this is likely representative of our customer base too...we might be able to get telemetry from the BI team if we need more data.
Do you have any thoughts?
@smith not sure if you have an opinion on this?
@roshan-elastic we'll probably have to fall back to attempting to correlate things using host.name
for some time, but we should prefer host.id
if at all possible.
@roshan-elastic
Using host.id is absolutely not ideal. We have working correlations between datasets containing lowercase fqdn's from logs with datasets where only an ip is known. A reverse dns lookup enables us to correlate network data (which does not contain any hostnames) with host data. Please please let's not go back in time and choose a solution which doesn't make any sense.
Lowercase fqdn in host.name is really tthe primary key you want to correlate on. NOT host.id, as a lot of datasets contain an id like '55de390e-6781-485a-a5c2-463180e52874'. How on earth do we have to correlate that with a lowercase fqdn in a dataset which has absolutely no idea where it whould get this host.id from??
@willemdh ➕ and thanks for the detail.
@smith For your immediate problem: metricbeat sets agent.name with the same value as host.name without domain, but preserving case, if not instructed otherwise AFAIK. Is this also lowercased now? Would that pose as a useful alternative for you?
Personally I agree with this, whenever someone tells me to check a host, I have to doublecheck if it spelled capital or not. The fields are of type "keyword" so that matters. Isn't this a problem that is isolated to windows? I am unaware of Unix-like systems that return mixed-case hostnames.
@willemdh metricbeat (8.11.4) does not generate host.hostname on my system, nor agent.hostname.
Isn't this a problem that is isolated to windows? I am unaware of Unix-like systems that return mixed-case hostnames.
We first diagnosed it with MacOS.
It appears metricbeat (and possibly other beats/agent integrations) converts the
host.name
to all lowercase. This causes problems when trying to associate with other names.We would expect the host.name to be unmodified as is the case with APM server.