Open axw opened 2 years ago
We should probably reconcile this with the inventory schema definition for "host", as well: https://github.com/elastic/observability-dev/blob/main/docs/dc/inventory_schema.md#host
field | field type | required | value type | description |
---|---|---|---|---|
host.id | dimension | ✔️ | keyword | This field should hold the FQDN hostname, if running in a cloud, use cloud.instance.id value instead. |
host.name | tag | ✔️ | keyword | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name, or a name specified by the user. For cloud providers, cloud.instance.name is used for host.name. |
~There's a request from the Elastic Agent team to report the Fully Qualified Domain Name (FQDN) if a certain flag is set to true by the Elastic Agent.~
~We need to discuss the path forward and whether this is considered a breaking change in APM Server.~
Update: The EA changes don't seem to be relevant here, but we still need to follow up with the new ECS changes and take them into consideration.
It seems to me that the recent changes make the existing values even muddier than they were previously.
host.hostname
is defined as "Hostname of the host. It normally contains what the hostname command returns on the host machine." It seems as though Kubernetes has its own rundown of which values may end up in its "hostname" command output from within a pod, described in the K8s docs. How do APM agents access 'hostname' values, generally? From Andrew's description, it sounds like services that run in Kubernetes will either have a host.hostname value set to the K8s node name OR it will be blank/null.
ECS defines host.name
as "Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host." I bolded the new added bit. I am not sure where APM gets configured_hostname
, but if that value doesn't exist, it seems that a service running in Kubernetes will have a host.name value set equal to its host.hostname value, which as described above, would be either the K8s node name or blank/null.
In one of the linked discussions, Andrew and Gil both mentioned that "I'm not sure if it's reasonable/viable for the pod to know its own node name", which makes me think that in many kubernetes cases, host.name and host.hostname will both be null.
...
Stepping back a minute, it seems like in lots of contexts (containers, pods), it may not be necessary for a service to know anything about where it's running except the container ID. Do we reliably know the container ID for services running in containers?
(All of this has major implications on Asset Topology, which is why I'm acutely interested. :D )
@AlexanderWert @estolfo if we want to follow the recommended value to set the lowercased FQDN for host.name
then apm agents would need to collect this information. We could then either always set this field or make it a decision of the apm server, depending on the signal of the Elastic Agent whether or not the FQDN field should be used.
We have identified some issues on the Infra UI related to the missing host.hostname
field. When we link to APM services and use a filter based on host.hostname,
it doesn't always locate the host. This problem seems to be related to the following points from the description:
- if kubernetes.node.name is set, then we use that
- otherwise, if any other kubernetes.* fields are set, we don't set host.hostname at all
Missing kubernetes.node.name
, resulting in APM agents reporting empty host.hostname
field:
https://edge-oblt.kb.us-west2.gcp.elastic-cloud.com/app/r?l=DISCOVER_APP_LOCATOR&v=8.9.0-SNAPSHOT&[…]8dKgWtS%2FUFZVgSJRGqd0CQADYORyHOsIBR1VQYNBfL4gA
APM agents properly reporting host.hostname
:
https://edge-oblt.kb.us-west2.gcp.elastic-cloud.com/app/r?l=DISCOVER_APP_LOCATOR&v=8.9.0-SNAPSHOT[...]N4IgjgrgpgTgniAXKSsGJACwPYGcAuAdAHYCGAtlIgAQ
The main issue happens when we want to display APM-related information for a specific host (e.g: linking to APM services filtering by host name). In order to ensure a consistent and reliable method for searching hosts, it would be nice if the host.hostname
field is always set.
Hey @smith, thanks for this.
I created an template for 'asks' to make it easy to lobby other teams for dependencies to be picked up and prioritised.
Do you think you can update this issue to match the template as best as you can?
(for this one, it doesn't need to be perfect - just generally enough to help me know who needs to do this and why they should do it)
This helps me understand who I need to ask to prioritise this and why (I can start the conversation with them but they'll likely bounce it back and ask for more detail so this info makes it much more likely to be prioritised):
At a high-level, it's just things like this:
Title : [REQUIRED TEAM NAME]
What is the ask?
...description...
Screenshot/gif/video demo'ing issue
{insert media}
What does this issue blocks?
What would this improve?
Any other issues this may relate to
(optional) Most likely PM only... What is the business impact if this doesn't get done?
Description...
Note : I just noticed you filled in this field so it's clear where it goes...
More about how we expect them to prioritise it (i.e. value/impact etc)
Hey @smith - just following up on this...I can't quite understand the ask/implication to our UI. You mind summarising quickly to help me talk to a PM about this if it needs prioritising?
@roshan-elastic I spoke with @simitt on Slack and she gave a great outline of what we (or somebody) need to do:
What has been preventing this issue from being picked up in the past is that there was no common agreement of how the hostname and name fields should be populated in the different settings (e.g. k8s). Do you think your team could come up with a full proposal on how the fields should be populated and how to retrive this information from the fields that are currently provided by the apm agents? We could give you some pointers on the current state. The apm server team should be able to implement changes in 8.10, but I don’t realistically see us driving the conversation and bringing this to a resolution on a concept level. So if your team could lead this effort, I think we could make room for the implementation. ... so we’ll schedule the implementation work for 8.10 then and wait for someone from your team to reach out with a proposal (or questions as a starter)
ECS recently merged clarification that host.name
is recommended to be the lowercase FQDN of the host. (https://github.com/elastic/ecs/pull/2122)
Here's the ECS reference for host.name
: https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-name
host.hostname
is above and has a similar description, though less detailed. Maybe the docs should clarify that host.hostname
is expected to be the output of the hostname
command, which may or may not include -f
to to show the FQDN, depending on your platform, so please write us if you find a reason to use this field.
host.name
is also configurable by the user to be whatever they put in their agent config.
OpenTelemetry semantic conventions have host.name
and its description is similar to what we have in ECS, so there's no conflict with the actual use of the field.
The problem we need to solve is the algorithm described in the issue description is not well specified and gets un-defined in some cases.
Hey @smith - sorry, I completely missed this! Thanks for taking the time to look at this, this is really helpful!
Let me have a think about how to tackle this but I'm thinking that we might have a few stakeholder on this - probably not many. For example, Miguel Luna (especially as he's very involved from a product POV on OTel) and not least, Sandra and yourself.
Priority-wise, I don't think this is a burning priority as the most immediate outcome I can think of this would be allowing users to consistently search by host.name
in APM (as well as infra). It does sound like sensible ground-work to see if we can at least get a common agreement going.
Do you have any thoughts on when/how we tackle this? Happy to have a chat about this if it's easier?
One option might be to hand over this issue from @elastic/apm-server to the group working on ECS/OTEL to clearly specify what host.name
means and where it comes from. Other vendors possibly have similar "tricks" like we do in k8s and if things were better specified by Semantic Conventions we could simplify our ingestion code by following those more detailed guidelines.
APM Server sets host.hostname as follows:
If
configured_hostname
is set, then we use that forhost.name
. Ifconfigured_hostname
is not set, then we sethost.name
to the same value ashost.hostname
.This complicated algorithm comes from https://github.com/elastic/apm/issues/21#issuecomment-476441441, where we intended to align with ECS (https://github.com/elastic/ecs/blob/1.0/use-cases/kubernetes.md). It is not explicitly captured in ECS, so we should verify that we're doing the right thing and update the docs (and Elastic Agent code if needed), or otherwise change the APM Server code.