elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.06k stars 4.89k forks source link

Metricbeat downcases host.name #38689

Open smith opened 3 months ago

smith commented 3 months ago

It appears metricbeat (and possibly other beats/agent integrations) converts the host.name to all lowercase. This causes problems when trying to associate with other names.

We would expect the host.name to be unmodified as is the case with APM server.

botelastic[bot] commented 3 months ago

This issue doesn't have a Team:<team> label.

willemdh commented 3 months ago

This discussion has been going on for years now.

host.name needs to be normalized and lowercased, exactly for correlation reasons. There are so many data sources each logging with their own naming conventions. Also if we for example do a reverse dns look up of an ip, it's always lowercase fqdn.

APM should also normalize and lowercase host.name if there is an issues there.

Original host name should be in host.hostname

@MikePaquette ;)

roshan-elastic commented 3 months ago

Thanks @willemdh

@smith - should we talk to the APM agents team?

smith commented 3 months ago

@smith - should we talk to the APM agents team?

@roshan-elastic I think so. If we're normalizing data we need to do it for all methods of ingest.

trentm commented 3 months ago

By APM agent spec (https://github.com/elastic/apm/blob/main/specs/agents/metadata.md#hostname) APM agents should be lowercasing the value they send to APM server (metadata.system.detected_hostname).

This was added to our specs about 9mo ago in https://github.com/elastic/apm/pull/805

THis issue https://github.com/elastic/apm/issues/794 has links to the implementation issues for each of the APM agents. That issue is "closed" for all but the Go APM Agent. We'd have to do some digging to see what version of each APM agent got this change and possible confirm that they are indeed lowercasing.

Do we have any info on which particular APM agents we are talking about here?


Previous discussion(s):

Other possible wrinkles:

roshan-elastic commented 3 months ago

Thanks for this @trentm - it sounds like our intent is to lower-case host.name collected via APM agent so anything which isn't doing this is either:

Do we have any info on which particular APM agents we are talking about here?

@smith is this something you or someone in the team can share? I'm only really worried if it's something that isn't going to be addressed eventually.

trentm commented 3 months ago

Do we have any info on which particular APM agents we are talking about here?

I asked on the originating issue: https://github.com/elastic/kibana/issues/178650#issuecomment-2036675567 Caue said it was the Go APM Agent, so that makes sense.

I'm only really worried if it's something that isn't going to be addressed eventually.

Development focus for the Go Agent is on the OTel side, so I'm not sure how timely any change would be here.

Also I gather we'll have the same issue with OTel APM agents, where the host.name spec differs from the suggestions in ECS's host.name spec. OTel doesn't say anything about normalizing case.

roshan-elastic commented 3 months ago

Development focus for the Go Agent is on the OTel side, so I'm not sure how timely any change would be here.

That's OK - the main thing is that we're aligned on how to solve it (we can sort 'when' via prioritisation etc).

OTel doesn't say anything about normalizing case.

Great catch.

@AlexanderWert / @mlunadia / @tommyers-elastic - Do you think we can enforce standardisation for OTel data? This issue is showing the pitfalls of mixing cases etc - it leads to dup data/confusing user experiences.

Note : This issue is specifically focusing on lower-casing host.name

AlexanderWert commented 3 months ago

All of this is a result of this change in ECS (~a year ago): https://github.com/elastic/ecs/pull/2122

So, now we have a mix of old collectors (that not necessarily do lowercasing) and newer collectors (that do lowercasing).

In OpenTelemetry SemanticConventions host.name is not being lowercased (and we can assume that we won't be able to change that): https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/

I think, the actual problem is that we use host.name to correlate data and use it as an identifier of the host. Actually, we should use host.id for correlation and identification, because that one is meant to be unique and reliable in both, ECS and SemConv. host.name should be rather used as a display name.

--> I really hope that with Assets / Entities these kind of things will be resolved!

ECS:

image

OTel SemConv:

image
trentm commented 3 months ago

Using host.id sounds good to me. For the current APM agents, it was only very recently added to APM agent specs. Only the Java APM agent will be producing host.id currently. As well, APM server's intakev2 API (used by the APM agents) does not yet handle host.id from APM agents. That's hopefully being added for 8.14.

roshan-elastic commented 2 months ago

actual problem is that we use host.name to correlate data and use it as an identifier of the host. Actually, we should use host.id for correlation and identification, because that one is meant to be unique and reliable in both, ECS and SemConv. host.name should be rather used as a display name.

That's a great point @AlexanderWert. I think that sounds sensible but I'm worried about what % of our customers will be able to supply this with current collection - especially as we want to leverage the host identifier across metricbeat, filebeat and the elastic agent integrations (and OTel).

Looking at one of our own clusters (us-east-1-logging...) internal collection for different agents, host.id looks pretty scarcely populated (e.g. 2-5% for filebeat) so I don't think that's feasible in the short-/medium-term from what I can see?

Filebeat - 2-5% have host.id

image

Metricbeat - around the same

image

It's a similar story on overview-....kb.us-west2.

I believe this is likely representative of our customer base too...we might be able to get telemetry from the BI team if we need more data.

Do you have any thoughts?

@smith not sure if you have an opinion on this?

smith commented 2 months ago

@roshan-elastic we'll probably have to fall back to attempting to correlate things using host.name for some time, but we should prefer host.id if at all possible.

willemdh commented 2 months ago

@roshan-elastic

Using host.id is absolutely not ideal. We have working correlations between datasets containing lowercase fqdn's from logs with datasets where only an ip is known. A reverse dns lookup enables us to correlate network data (which does not contain any hostnames) with host data. Please please let's not go back in time and choose a solution which doesn't make any sense.

Lowercase fqdn in host.name is really tthe primary key you want to correlate on. NOT host.id, as a lot of datasets contain an id like '55de390e-6781-485a-a5c2-463180e52874'. How on earth do we have to correlate that with a lowercase fqdn in a dataset which has absolutely no idea where it whould get this host.id from??

roshan-elastic commented 2 months ago

@willemdh ➕ and thanks for the detail.

ash-darin commented 2 months ago

@smith For your immediate problem: metricbeat sets agent.name with the same value as host.name without domain, but preserving case, if not instructed otherwise AFAIK. Is this also lowercased now? Would that pose as a useful alternative for you?

Personally I agree with this, whenever someone tells me to check a host, I have to doublecheck if it spelled capital or not. The fields are of type "keyword" so that matters. Isn't this a problem that is isolated to windows? I am unaware of Unix-like systems that return mixed-case hostnames.

@willemdh metricbeat (8.11.4) does not generate host.hostname on my system, nor agent.hostname.

smith commented 2 months ago

Isn't this a problem that is isolated to windows? I am unaware of Unix-like systems that return mixed-case hostnames.

We first diagnosed it with MacOS.