GoogleCloudPlatform / ops-agent

Apache License 2.0
140 stars 67 forks source link

Missing log labels: instance_id, zone #493

Open b3nk3nobi opened 2 years ago

b3nk3nobi commented 2 years ago

In the Log Explorer there is no information about instance_id or zone: vivaldi__2022_03_24_08-53-05

I cannot filter logs by instance: vivaldi__2022_03_24_09-38-13

VMs don't have any service account assigned and it cannot be changed because they are in production (can't stop them), so I'm using file with SA private key located in /etc/google/auth/application_default_credentials.json.

Using curl to get these values on the actual VM is working fine: WindowsTerminal__2022_03_24_09-00-20

Am I missing something or is it working as it should be?

sophieyfang commented 2 years ago

Have you followed the instructions here: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/authorization#authorizing_the_ops_agent

b3nk3nobi commented 2 years ago

Yes, I have followed instructions. Without it there is a lot of errors about authorization in logging-module.log

logicbomb421 commented 2 years ago

I am also facing this issue.

I am in a similar situation as @b3nk3nobi where we don't run VMs with attached service accounts, and can't change that. I have followed the instructions in the authenticating the agent docs by creating a service account key, placing it in a secure location, and then setting GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account/key.json. The logging service logs show successful oauth2 communication, and logs do make it to Stackdriver, so I know the key is being picked up.

I am also able to receive the instance ID and zone via the metadata.google.internal HTTP call on the VM in question.

Without the resource.labels.instance_id field correctly set, logs don't properly associate with their originating VM, causing issues when creating dashboards, searching logs, etc.

Please let me know if more detail is needed. Thanks!


Update

Got clever and thought I'd found a workaround, but it didn't work. Unsure if this is related, or should be a separate issue (please let me know).

Attempting to use the modify_fields processor to set a static value here:

processors:
  fix_missing_labels:
    type: modify_fields
    fields:
      resource.labels.instance_id:
        static_value: my-instance-name

...results in the following error:

The agent config file is not valid. Detailed error: [18:36] "fields[resource.labels.instance_id]": 1:28: error: field "resource.labels.instance_id" not found

My understanding based on the docs is that the destination field (resource.labels.instance_id) needs to conform to the LogEntry object spec, which it does as far as I can tell.

I did notice in the docs that the exclude_logs processor says only httpRequest, jsonPayload, labels, operation, severity, and sourceLocation can be accessed, perhaps this is what's going in for modify_fields as well?

If so, I would suggest updating the documentation to include this information under that section as well.

hsmatulisgoogle commented 1 year ago

These are missing due to how fluent bit operates when it is not authenticated through the metadata server, since it will only auto populate these fields through the metadata_server_auth authentication.

If you just want to be able to identify VMs, as a work-around, versions after 2.15.0 include https://github.com/GoogleCloudPlatform/ops-agent/pull/544 which auto populates the resource_name label

Replying to @logicbomb421 's attempted work-around: