Open alex-fedotyev opened 3 years ago
Pinging: @kaiyan-sheng @exekias @sorantis I just realized that OTel spec suggests using cloud.instance.id while we suggest using cloud.instance.name.
Are those fields the same? Or would it make more sense to align around cloud.instance.id?
The OTel spec is vague in the non-cloud case though. In that case what is the unique ID? Is it /etc/machine-id
or is it FQDN...?
I just realized that OTel spec suggests using cloud.instance.id while we suggest using cloud.instance.name.
We are suggesting cloud.instance.id
too, see https://github.com/elastic/observability-dev/pull/1137/files#diff-c5a9ab0ff94fc3963d0bb04177a5a800457970a01608274951e8a6a0b0023057R40
The OTel spec is vague in the non-cloud case though. In that case what is the unique ID? Is it
/etc/machine-id
or is it FQDN...?
I would say FQDN works better, machine-id can only retrieved from inside the machine, so while it guarantees to be unique, it's not very useful for correlation (specifically to correlate events coming from monitoring the machine from outside vs inside).
The OTel spec is vague in the non-cloud case though. In that case what is the unique ID? Is it /etc/machine-id or is it FQDN...?
@cyrille-leclerc - any chance you know how OTel defines host.id in non-cloud environments?
I just realized that OTel spec suggests using cloud.instance.id while we suggest using cloud.instance.name.
Are those fields the same? Or would it make more sense to align around cloud.instance.id?
Yes we are also using cloud.instance.id
. Problem with using cloud.instance.name
is, it is not a required field in some of the cloud providers. For example, in AWS EC2, instance name is not required and defined by tag Name
.
@axw my understanding is that the only host
information we collect in OpenTelemetry traces is host.id
and only when when there is a network communication, mapping the Otel net.*
namespace.
I collected the documents of the transaction and all the spans of a trace. Unfortunately, everything runs on my local Macbook without Docker making it more difficult to understand the usage of the host.hostname
, host.ip
... attributes as everything is localhost/127.0.0.1.
See https://gist.github.com/cyrille-leclerc/e5b4a1fb214f83cc9e7819953ebbd3e3
I only found 2 occurences of host
on span documents, on the connection spans.
@axw Could we have omitted to map other Otel host attributes?
I looked at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/master/exporter/elasticexporter/internal/translator/elastic/traces.go but I didn't find any hint.
@axw Could we have omitted to map other Otel host attributes?
Yes; what is there is not comprehensive. We will need to add support for translating host.id
, among others.
@alex-fedotyev OpenTelemetry host.id
is NOT defined by the OpenTelemetry collector outside of cloud deployments. I only found enrichment of host.id
on AWS and GCP so far.
Research notes
host.id
)AttributeHostID="host.id"
AttributeHostID
https://github.com/elastic/apm-server/pull/4955 will add host.id
for OpenTelemetry data.
We still need some conclusion on what to do for our agents. We could just set it to cloud instance ID for now, when it's set.
I would say FQDN works better, machine-id can only retrieved from inside the machine, so while it guarantees to be unique, it's not very useful for correlation (specifically to correlate events coming from monitoring the machine from outside vs inside).
@exekias does beats already do this? I just took a quick look and it appears to be using go-sysinfo's "HostInfo.UniqueID", which is populated using machine-id.
Not yet, right now beats report host.id
as the machine id, so we will need to do a breaking change, or introduce the change directly in the agent. @kaiyan-sheng I think you had an issue to discuss this?
Sorry I just saw this message 🤕 Yes here is the issue: https://github.com/elastic/beats/issues/22739
Metrics and logs identified a problem of using host.name for correlation when ingesting data from cloud environments, as they don't provide proper host name. Proposed solution is to introduce host.id field which is "calculated" and is equal host.name for on-premises environments, and for cloud it is equal cloud.instance.id. Original issue and spreadsheet with the breakdown.
This seems to align well with OTel spec, as they are using cloud instance_id as the host.id.
The proposal for APM is to calculate host.id dynamically based on whether cloud metadata is present or using host.name otherwise. We would leverage this when integrating products together, i.e. linking from Infra to APM and vice versa.
We would also need to recognize host.id when ingesting data from OTel.
CC: @graphaelli @felixbarny