elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
112 stars 4.93k forks source link

[Metricbeat] Googlecloud Module: Error trying to get data from instance #15613

Open kaiyan-sheng opened 4 years ago

kaiyan-sheng commented 4 years ago

Error message showed up when testing googlecloud metricbeat module:

2020-01-16T06:45:32.678-0700    ERROR   [googlecloud.compute]   stackdriver/timeseries.go:37    error trying to retrieve ID from metric event 'error trying to get data from instance '8970924285285850010' in zone 'europe-west3-c': error getting instance information for instance with ID '8970924285285850010': googleapi: Error 404: The resource 'projects/elastic-observability/zones/europe-west3-c/instances/8970924285285850010' was not found, notFound'

The config I used is:

- module: googlecloud
  metricsets:
    - compute
  zone: "europe-west3-a"
  project_id: "elastic-observability"
  credentials_file_path: "/Users/kaiyansheng/Downloads/elastic-observability-d17781618202.json"
  exclude_labels: false
  period: 300s
- module: googlecloud
  metricsets:
    - compute
  zone: "europe-west3-c"
  project_id: "elastic-observability"
  credentials_file_path: "/Users/kaiyansheng/Downloads/elastic-observability-d17781618202.json"
  exclude_labels: false
  period: 300s
kaiyan-sheng commented 4 years ago

I did more testing on GCP API Explorer https://cloud.google.com/compute/docs/reference/rest/v1/instances/get with project=elastic-observability, zone=europe-west3-c and instance=8970924285285850010. I also get an 404 error:

{
  "error": {
    "code": 404,
    "message": "The resource 'projects/elastic-observability/zones/europe-west3-c/instances/8970924285285850010' was not found",
    "errors": [
      {
        "message": "The resource 'projects/elastic-observability/zones/europe-west3-c/instances/8970924285285850010' was not found",
        "domain": "global",
        "reason": "notFound"
      }
    ]
  }
}

I don't see this instance on GCP portal either.

endorama commented 3 years ago

I did some investigation:

  1. I was not able to reproduce this as a "permission denied"; The error for this case (printed by gcloud cli) was:
    ERROR: (gcloud.compute.instances.describe) Could not fetch resource:
     - Required 'compute.instances.get' permission for 'projects/elastic-observability/zones/europe-west2-c/instances/paulb-europe-west2c'

    I would assume that the same error is returned by APIs.

  2. 404 is easily triggered by querying for a compute instance in the wrong zone (I'm not sure if this may be the case, I suspect not and more logs are needed to troubleshoot)
  3. without having a reproducible case this issue is problematic to troubleshoot, as the last comment suggest the mentioned instance disappeared at some point. This trigger the question: why we were trying to collect metrics from a no more available instance? Could this behaviour be the result of an instance being shut down?
  4. right now it seems the code discard the event if it's not able to compute the "event id" from the metadata. Is this a correct behaviour? It may be the case that without event id there is no way to proceed, but shouldn't this be logged as an error in that case?
endorama commented 3 years ago

@exekias may you help me answer n 4 from above comment?

botelastic[bot] commented 1 month ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!