GoogleCloudPlatform / k8s-config-connector

GCP Config Connector, a Kubernetes add-on for managing GCP resources
https://cloud.google.com/config-connector/docs/overview
Apache License 2.0
878 stars 215 forks source link

otel-collector createMetricDescriptor errors #707

Open tdoernenburg opened 1 year ago

tdoernenburg commented 1 year ago

Checklist

Bug Description

Hello, I see a lot of createMetricDescriptor errors in the google cloud logging.

I see numerous error messages for multiple Resource Group Controller metrics and Config Sync metrics:

There are two kinds of error messages

Permission denied for service account:

Missing descriptor type value:

Additional Diagnostic Information

We configured a workload identity for the config-management-monitoring/default service account as described in https://cloud.google.com/anthos-config-management/docs/how-to/monitoring-config-sync#custom-monitoring, and the IAM service account has the Monitoring Metric Writer role.

Cluster Features Cloud Logging - System, Workloads Cloud Monitoring - System

Kubernetes Cluster Version

1.22.10-gke.600

Config Connector Version

1.89.0

Config Connector Mode

namespaced mode (default)

Log Output

otel-collector logs

2022-09-15T11:57:54.068Z error collector@v0.32.2/metrics.go:316 Unable to send metric descriptor. {"kind": "exporter", "data_type": "metrics", "name": "googlecloud/kubernetes", "error": "rpc error: code = PermissionDenied desc = User sync-config-user@example-project.iam.gserviceaccount.com does not have permission to write to metric kubernetes.io/internal/addons/config_sync/last_pipeline_error_observed.", "metric_descriptor": "name:\"last_pipeline_error_observed\" type:\"kubernetes.io/internal/addons/config_sync/last_pipeline_error_observed\" labels:{key:\"instrumentation_source\"} labels:{key:\"instrumentation_version\"} labels:{key:\"component\"} labels:{key:\"name\"} labels:{key:\"reconciler\"} metric_kind:GAUGE value_type:INT64 unit:\"1\" description:\"A boolean indicates if any error happens from different stages when syncing a commit\" display_name:\"internal/addons/config_sync/last_pipeline_error_observed\""} github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector.(MetricsExporter).exportMetricDescriptor github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector@v0.32.2/metrics.go:316 github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector.(MetricsExporter).exportMetricDescriptorRunner github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector@v0.32.2/metrics.go:296

2022-09-15T11:57:54.089Z error collector@v0.32.2/metrics.go:316 Unable to send metric descriptor. {"kind": "exporter", "data_type": "metrics", "name": "googlecloud/kubernetes", "error": "rpc error: code = InvalidArgument desc = Request was missing field metricDescriptor.valueType: The descriptor does not have the value type set.", "metric_descriptor": "name:\"resource_fights_count\" type:\"kubernetes.io/internal/addons/config_sync/resource_fights_count\" labels:{key:\"instrumentation_source\"} labels:{key:\"instrumentation_version\"} metric_kind:CUMULATIVE unit:\"1\" description:\"The total number of resources that are being synced too frequently\" display_name:\"internal/addons/config_sync/resource_fights_count\""} github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector.(MetricsExporter).exportMetricDescriptor github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector@v0.32.2/metrics.go:316 github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector.(MetricsExporter).exportMetricDescriptorRunner github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector@v0.32.2/metrics.go:296

2022-09-15T11:57:54.265Z warn batchprocessor/batch_processor.go:178 Sender failed {"kind": "processor", "name": "batch", "pipeline": "metrics/kubernetes", "error": "failed to export time series to GCM: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_source], [instrumentation_version]; Unrecognized metric labels: [instrumentation_version], [instrumentation_source]\nerror details: name = Unknown desc = total_point_count:20 errors:{status:{code:3} point_count:20}"}

Steps to reproduce the issue

Followed the documentation.

YAML snippets

No response

diviner524 commented 1 year ago

@tdoernenburg This seems to be a config sync issue. Could you please try below to get support:

vparmeland commented 6 months ago

Any news ?