elastic / apm-server

https://www.elastic.co/guide/en/apm/guide/current/index.html
Other
1.21k stars 518 forks source link

Ensure that host.name is aligned with Elastic Agent system integration #8118

Open axw opened 2 years ago

axw commented 2 years ago

APM Server sets host.hostname as follows:

  1. if kubernetes.node.name is set, then we use that
  2. otherwise, if any other kubernetes.* fields are set, we don't set host.hostname at all
  3. finally, if no kubernetes.* fields are set, then we use detected_hostname

If configured_hostname is set, then we use that for host.name. If configured_hostname is not set, then we set host.name to the same value as host.hostname.

This complicated algorithm comes from https://github.com/elastic/apm/issues/21#issuecomment-476441441, where we intended to align with ECS (https://github.com/elastic/ecs/blob/1.0/use-cases/kubernetes.md). It is not explicitly captured in ECS, so we should verify that we're doing the right thing and update the docs (and Elastic Agent code if needed), or otherwise change the APM Server code.

jasonrhodes commented 2 years ago

We should probably reconcile this with the inventory schema definition for "host", as well: https://github.com/elastic/observability-dev/blob/main/docs/dc/inventory_schema.md#host

field field type required value type description
host.id dimension ✔️ keyword This field should hold the FQDN hostname, if running in a cloud, use cloud.instance.id value instead.
host.name tag ✔️ keyword Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name, or a name specified by the user. For cloud providers, cloud.instance.name is used for host.name.
simitt commented 1 year ago

~There's a request from the Elastic Agent team to report the Fully Qualified Domain Name (FQDN) if a certain flag is set to true by the Elastic Agent.~

~We need to discuss the path forward and whether this is considered a breaking change in APM Server.~

Update: The EA changes don't seem to be relevant here, but we still need to follow up with the new ECS changes and take them into consideration.

jasonrhodes commented 1 year ago

It seems to me that the recent changes make the existing values even muddier than they were previously.

Updated asciidoc

host.hostname is defined as "Hostname of the host. It normally contains what the hostname command returns on the host machine." It seems as though Kubernetes has its own rundown of which values may end up in its "hostname" command output from within a pod, described in the K8s docs. How do APM agents access 'hostname' values, generally? From Andrew's description, it sounds like services that run in Kubernetes will either have a host.hostname value set to the K8s node name OR it will be blank/null.

ECS defines host.name as "Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host." I bolded the new added bit. I am not sure where APM gets configured_hostname, but if that value doesn't exist, it seems that a service running in Kubernetes will have a host.name value set equal to its host.hostname value, which as described above, would be either the K8s node name or blank/null.

In one of the linked discussions, Andrew and Gil both mentioned that "I'm not sure if it's reasonable/viable for the pod to know its own node name", which makes me think that in many kubernetes cases, host.name and host.hostname will both be null.

...

Stepping back a minute, it seems like in lots of contexts (containers, pods), it may not be necessary for a service to know anything about where it's running except the container ID. Do we reliably know the container ID for services running in containers?

(All of this has major implications on Asset Topology, which is why I'm acutely interested. :D )

simitt commented 1 year ago

@AlexanderWert @estolfo if we want to follow the recommended value to set the lowercased FQDN for host.name then apm agents would need to collect this information. We could then either always set this field or make it a decision of the apm server, depending on the signal of the Elastic Agent whether or not the FQDN field should be used.

crespocarlos commented 1 year ago

We have identified some issues on the Infra UI related to the missing host.hostname field. When we link to APM services and use a filter based on host.hostname, it doesn't always locate the host. This problem seems to be related to the following points from the description:

  1. if kubernetes.node.name is set, then we use that
  2. otherwise, if any other kubernetes.* fields are set, we don't set host.hostname at all

Missing kubernetes.node.name, resulting in APM agents reporting empty host.hostname field: https://edge-oblt.kb.us-west2.gcp.elastic-cloud.com/app/r?l=DISCOVER_APP_LOCATOR&v=8.9.0-SNAPSHOT&[…]8dKgWtS%2FUFZVgSJRGqd0CQADYORyHOsIBR1VQYNBfL4gA

APM agents properly reporting host.hostname : https://edge-oblt.kb.us-west2.gcp.elastic-cloud.com/app/r?l=DISCOVER_APP_LOCATOR&v=8.9.0-SNAPSHOT[...]N4IgjgrgpgTgniAXKSsGJACwPYGcAuAdAHYCGAtlIgAQ

The main issue happens when we want to display APM-related information for a specific host (e.g: linking to APM services filtering by host name). In order to ensure a consistent and reliable method for searching hosts, it would be nice if the host.hostname field is always set.

roshan-elastic commented 1 year ago

Hey @smith, thanks for this.

I created an template for 'asks' to make it easy to lobby other teams for dependencies to be picked up and prioritised.

Do you think you can update this issue to match the template as best as you can?

(for this one, it doesn't need to be perfect - just generally enough to help me know who needs to do this and why they should do it)

This helps me understand who I need to ask to prioritise this and why (I can start the conversation with them but they'll likely bounce it back and ask for more detail so this info makes it much more likely to be prioritised):

At a high-level, it's just things like this:

Title : [REQUIRED TEAM NAME]

📖 Description

What is the ask?

...description...

⏯️ 📷 Demo of Issue

Screenshot/gif/video demo'ing issue

{insert media}

Related Issues

🛑 Blocked issues

What does this issue blocks?

😄 Issues improved

What would this improve?

🔗 Other related issues

Any other issues this may relate to

💰 Business Impact

(optional) Most likely PM only... What is the business impact if this doesn't get done?

Description...

roshan-elastic commented 1 year ago

Note : I just noticed you filled in this field so it's clear where it goes...

Image

More about how we expect them to prioritise it (i.e. value/impact etc)

roshan-elastic commented 1 year ago

Hey @smith - just following up on this...I can't quite understand the ask/implication to our UI. You mind summarising quickly to help me talk to a PM about this if it needs prioritising?

smith commented 1 year ago

@roshan-elastic I spoke with @simitt on Slack and she gave a great outline of what we (or somebody) need to do:

What has been preventing this issue from being picked up in the past is that there was no common agreement of how the hostname and name fields should be populated in the different settings (e.g. k8s). Do you think your team could come up with a full proposal on how the fields should be populated and how to retrive this information from the fields that are currently provided by the apm agents? We could give you some pointers on the current state. The apm server team should be able to implement changes in 8.10, but I don’t realistically see us driving the conversation and bringing this to a resolution on a concept level. So if your team could lead this effort, I think we could make room for the implementation. ... so we’ll schedule the implementation work for 8.10 then and wait for someone from your team to reach out with a proposal (or questions as a starter)

ECS recently merged clarification that host.name is recommended to be the lowercase FQDN of the host. (https://github.com/elastic/ecs/pull/2122)

Here's the ECS reference for host.name: https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-name

host.hostname is above and has a similar description, though less detailed. Maybe the docs should clarify that host.hostname is expected to be the output of the hostname command, which may or may not include -f to to show the FQDN, depending on your platform, so please write us if you find a reason to use this field.

host.name is also configurable by the user to be whatever they put in their agent config.

OpenTelemetry semantic conventions have host.name and its description is similar to what we have in ECS, so there's no conflict with the actual use of the field.

The problem we need to solve is the algorithm described in the issue description is not well specified and gets un-defined in some cases.

roshan-elastic commented 1 year ago

Hey @smith - sorry, I completely missed this! Thanks for taking the time to look at this, this is really helpful!

Let me have a think about how to tackle this but I'm thinking that we might have a few stakeholder on this - probably not many. For example, Miguel Luna (especially as he's very involved from a product POV on OTel) and not least, Sandra and yourself.

Priority-wise, I don't think this is a burning priority as the most immediate outcome I can think of this would be allowing users to consistently search by host.name in APM (as well as infra). It does sound like sensible ground-work to see if we can at least get a common agreement going.

Do you have any thoughts on when/how we tackle this? Happy to have a chat about this if it's easier?

smith commented 1 year ago

One option might be to hand over this issue from @elastic/apm-server to the group working on ECS/OTEL to clearly specify what host.name means and where it comes from. Other vendors possibly have similar "tricks" like we do in k8s and if things were better specified by Semantic Conventions we could simplify our ingestion code by following those more detailed guidelines.