elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
384 stars 114 forks source link

Failure to parse machineType within GCP metadata #847

Open Yaty opened 10 months ago

Yaty commented 10 months ago

Hello,

We use Elastic APM through the Ruby agent. I'm opening this issue on this project because I think this may apply to all agents. We're not running straight on a VM, but in a container within a Google Kubernetes Engine (k8s version 1.25.10-gke.2700 / node-pool version 1.24.12-gke.1000).

To add some context: we had to update to the 4.7.1 version of the ruby agent because we need a fix for Rails 7.1 https://github.com/elastic/apm-agent-ruby/releases/tag/v4.7.1

This release also includes a fix for parsing GCP metadata : https://github.com/elastic/apm-agent-ruby/pull/1415 -> https://github.com/elastic/apm/issues/826

The response of http://metadata.google.internal/computeMetadata/v1/?recursive=true looks like this (sanitized):

{“instance”:{“attributes”:{“cluster-location”:“”,“cluster-name”:“”,“cluster-uid”:“”},“hostname”:“”,“id”:0,“serviceAccounts”:{“default”:{“aliases”:[“default”],“email”:“”,“scopes”:[“https://www.googleapis.com/auth/cloud-platform”,“https://www.googleapis.com/auth/userinfo.email”]},“our service account”:{“aliases”:[“default”],“email”:“”,“scopes”:[“https://www.googleapis.com/auth/cloud-platform”,“https://www.googleapis.com/auth/userinfo.email”]}},“zone”:“projects/.../zones/...”},“project”:{“numericProjectId”:0,“projectId”:“”}}

As you see, machineType is not even returned, so when https://github.com/elastic/apm-agent-ruby/commit/c62fbc5be97ef638509b7a6d662cbfa7c62cbc4a#diff-9c3da70caafefb51aa3733d716da05e65198a507bf6d76aac25baa115c93195fR105 is called the parsing with split makes the whole thing crash.

Before there was no parsing of machineType so it was simply defined as null.

I think all agents should check for machineType presence before trying to parse it? Or move this change into a major version, because this is a breaking change IMO?

We plan to monkey patch the agent in the meantime.

Thanks!