GoogleCloudPlatform / opentelemetry-operations-go

Apache License 2.0
134 stars 103 forks source link

Custom resource labels a missing when createMonitoredResource #534

Closed tiffanny29631 closed 1 year ago

tiffanny29631 commented 1 year ago

We are trying to add some custom resource labels to have them converted with resource_filter into metric labels later. An example custom resource label looks like configsync.sync.name.

We've verified that with resource_to_telemetry_conversion config in Prometheus exporter these custom labels can be correctly converted so they do exist fine.

From the testing so far those labels seem to be missing when exporting to GCM and I only get the labels specified in the resource mapping. IIUC this conversion only includes the specified labels? Are custom resource labels processed somewhere else or dropped?

Thanks!

damemi commented 1 year ago

Hi @tiffanny29631, the attributes from resource_filter are parsed into extra labels during this call to resourceToMetricLabels.

So, any resource attributes that match the prefix you set in resource_filter should get converted. Can you share a sample of your collector config?

tiffanny29631 commented 1 year ago

Below is our Otel Configuration for GCM and Prom

    receivers:
      opencensus:
    exporters:
      prometheus:
        endpoint: :8675
        namespace: config_sync
      googlecloud:
        metric:
          prefix: "custom.googleapis.com/opencensus/config_sync/"
          skip_create_descriptor: true
          resource_filters:
            prefix: "config"
          instrumentation_library_labels: false
        retry_on_failure:
          enabled: false
        sending_queue:
          enabled: false
    processors:
      batch:
      resourcedetection:
        detectors: [env, gcp]
      filter/cloudmonitoring:
        metrics:
          include:
            match_type: regexp
            metric_names:
              - reconciler_errors
              - pipeline_error_observed
              - declared_resources
              - apply_operations_total
              - resource_fights_total
              - internal_errors_total
              - kcc_resource_count
              - resource_count
              - ready_resource_count
              - cluster_scoped_resource_count
              - resource_ns_count
              - api_duration_seconds
    extensions:
      health_check:
    service:
      extensions: [health_check]
      pipelines:
        metrics/cloudmonitoring:
          receivers: [opencensus]
          processors: [resourcedetection, batch, filter/cloudmonitoring]
          exporters: [googlecloud]
        metrics/prometheus:
          receivers: [opencensus]
          processors: [batch]
          exporters: [prometheus]
damemi commented 1 year ago

Sorry, could you also share your resources using the file exporter?

tiffanny29631 commented 1 year ago

The labels are

k8s.pod.name=$(KUBE_POD_NAME),\
               k8s.pod.namespace=$(KUBE_POD_NAMESPACE),\
               k8s.pod.uid=$(KUBE_POD_UID),\
               k8s.pod.ip=$(KUBE_POD_IP),\
               k8s.node.name=$(KUBE_NODE_NAME),\
               k8s.deployment.name=$(KUBE_DEPLOYMENT_NAME),\
               configsync.sync.kind=$(CONFIGSYNC_SYNC_KIND),\
               configsync.sync.name=$(CONFIGSYNC_SYNC_NAME),\
               configsync.sync.namespace=$(CONFIGSYNC_SYNC_NAMESPACE)

For the first two ones they get recognized fine with the resource mapping.

tiffanny29631 commented 1 year ago

Is there a good way to verify that the resource exists in m.resourceMetrics before / after the filtering?

damemi commented 1 year ago

Yes, can you try using the file exporter to write the raw metrics to a JSON file? It will include the resources which we can look at similarly to this comment

tiffanny29631 commented 1 year ago

The image we have doesn't seem to contain shell access, I'm looking at extracting the file written by the file exporter.

tiffanny29631 commented 1 year ago

Ah I finally pulled it out

{
  "resourceMetrics": [
    {
      "resource": {
        "attributes": [
          {
            "key": "opencensus.starttime",
            "value": {
              "stringValue": "2022-11-22T21:29:22.810809666Z"
            }
          },
          {
            "key": "host.name",
            "value": {
              "stringValue": "ns-reconciler-gamestore-d9d977948-n62jw"
            }
          },
          {
            "key": "process.pid",
            "value": {
              "intValue": "1"
            }
          },
          {
            "key": "telemetry.sdk.version",
            "value": {
              "stringValue": "0.23.0"
            }
          },
          {
            "key": "opencensus.exporterversion",
            "value": {
              "stringValue": "0.0.1"
            }
          },
          {
            "key": "telemetry.sdk.language",
            "value": {
              "stringValue": "go"
            }
          },
          {
            "key": "cloud.provider",
            "value": {
              "stringValue": "gcp"
            }
          },
          {
            "key": "k8s.pod.name",
            "value": {
              "stringValue": "ns-reconciler-gamestore-d9d977948-n62jw"
            }
          },
          {
            "key": "k8s.node.name",
            "value": {
              "stringValue": "gke-peip-test-default-pool-33ce93fa-u2qd"
            }
          },
          {
            "key": "configsync.sync.namespace",
            "value": {
              "stringValue": "gamestore"
            }
          },
          {
            "key": "k8s.pod.namespace",
            "value": {
              "stringValue": "config-management-system"
            }
          },
          {
            "key": "cloud.availability_zone",
            "value": {
              "stringValue": "us-central1-c"
            }
          },
          {
            "key": "k8s.deployment.name",
            "value": {
              "stringValue": "ns-reconciler-gamestore"
            }
          },
          {
            "key": "k8s.namespace.name",
            "value": {
              "stringValue": "gamestore"
            }
          },
          {
            "key": "k8s.pod.ip",
            "value": {
              "stringValue": "10.104.0.199"
            }
          },
          {
            "key": "k8s.cluster.name",
            "value": {
              "stringValue": "peip-test"
            }
          },
          {
            "key": "host.id",
            "value": {
              "stringValue": "3475782092259600616"
            }
          },
          {
            "key": "configsync.sync.name",
            "value": {
              "stringValue": "repo-sync"
            }
          },
          {
            "key": "cloud.account.id",
            "value": {
              "stringValue": "haiyanmeng-gke-322517"
            }
          },
          {
            "key": "k8s.pod.uid",
            "value": {
              "stringValue": "a2427d08-a6b4-40b9-83e6-c7161aba8a62"
            }
          },
          {
            "key": "configsync.sync.kind",
            "value": {
              "stringValue": ""
            }
          },
          {
            "key": "cloud.platform",
            "value": {
              "stringValue": "gcp_kubernetes_engine"
            }
          },
          {
            "key": "opencensus.resourcetype",
            "value": {
              "stringValue": "k8s"
            }
          }
        ]
      },
      "scopeMetrics": [
        {
          "scope": {},
          "metrics": [
            {
              "name": "apply_operations_total",
              "description": "The total number of operations that have been performed to sync resources to source of truth",
              "unit": "1",
              "sum": {
                "dataPoints": [
                  {
                    "attributes": [
                      {
                        "key": "operation",
                        "value": {
                          "stringValue": "update"
                        }
                      },
                      {
                        "key": "status",
                        "value": {
                          "stringValue": "success"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824383742",
                    "timeUnixNano": "1669859622810196763",
                    "asInt": "402"
                  }
                ],
                "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
                "isMonotonic": true
              }
            },
            {
              "name": "pipeline_error_observed",
              "description": "A boolean value indicates if error happened from different stages when syncing a commit",
              "unit": "1",
              "gauge": {
                "dataPoints": [
                  {
                    "attributes": [
                      {
                        "key": "component",
                        "value": {
                          "stringValue": "sync"
                        }
                      },
                      {
                        "key": "name",
                        "value": {
                          "stringValue": "ns-reconciler-gamestore"
                        }
                      },
                      {
                        "key": "reconciler",
                        "value": {
                          "stringValue": "repo-sync"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824418390",
                    "timeUnixNano": "1669859622810236046",
                    "asInt": "0"
                  },
                  {
                    "attributes": [
                      {
                        "key": "component",
                        "value": {
                          "stringValue": "rendering"
                        }
                      },
                      {
                        "key": "name",
                        "value": {
                          "stringValue": "ns-reconciler-gamestore"
                        }
                      },
                      {
                        "key": "reconciler",
                        "value": {
                          "stringValue": "repo-sync"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824418390",
                    "timeUnixNano": "1669859622810236046",
                    "asInt": "0"
                  },
                  {
                    "attributes": [
                      {
                        "key": "component",
                        "value": {
                          "stringValue": "source"
                        }
                      },
                      {
                        "key": "name",
                        "value": {
                          "stringValue": "ns-reconciler-gamestore"
                        }
                      },
                      {
                        "key": "reconciler",
                        "value": {
                          "stringValue": "repo-sync"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824418390",
                    "timeUnixNano": "1669859622810236046",
                    "asInt": "0"
                  }
                ]
              }
            },
            {
              "name": "declared_resources",
              "description": "The current number of declared resources parsed from Git",
              "unit": "1",
              "gauge": {
                "dataPoints": [
                  {
                    "startTimeUnixNano": "1669152562824381933",
                    "timeUnixNano": "1669859622810286550",
                    "asInt": "2"
                  }
                ]
              }
            },
            {
              "name": "resource_fights_total",
              "description": "The total number of resources that are being synced too frequently",
              "unit": "1",
              "sum": {
                "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
                "isMonotonic": true
              }
            },
            {
              "name": "reconciler_errors",
              "description": "The current number of errors in the RootSync and RepoSync reconcilers",
              "unit": "1",
              "gauge": {
                "dataPoints": [
                  {
                    "attributes": [
                      {
                        "key": "component",
                        "value": {
                          "stringValue": "sync"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824373714",
                    "timeUnixNano": "1669859622810299138",
                    "asInt": "0"
                  },
                  {
                    "attributes": [
                      {
                        "key": "component",
                        "value": {
                          "stringValue": "parsing"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824373714",
                    "timeUnixNano": "1669859622810299138",
                    "asInt": "0"
                  },
                  {
                    "attributes": [
                      {
                        "key": "component",
                        "value": {
                          "stringValue": "source"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824373714",
                    "timeUnixNano": "1669859622810299138",
                    "asInt": "0"
                  }
                ]
              }
            },
            {
              "name": "internal_errors_total",
              "description": "The total number of internal errors triggered by Config Sync",
              "unit": "1",
              "sum": {
                "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
                "isMonotonic": true
              }
            },
            {
              "name": "api_duration_seconds",
              "description": "The latency distribution of API server calls",
              "unit": "s",
              "histogram": {
                "dataPoints": [
                  {
                    "attributes": [
                      {
                        "key": "operation",
                        "value": {
                          "stringValue": "update"
                        }
                      },
                      {
                        "key": "status",
                        "value": {
                          "stringValue": "success"
                        }
                      }
                    ],
                    "startTimeUnixNano": "1669152562824368286",
                    "timeUnixNano": "1669859622810308415",
                    "count": "396",
                    "sum": 0.003984054000000001,
                    "bucketCounts": [
                      "396",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0",
                      "0"
                    ],
                    "explicitBounds": [
                      0.005,
                      0.01,
                      0.025,
                      0.05,
                      0.1,
                      0.25,
                      0.5,
                      1,
                      2.5,
                      5,
                      10
                    ]
                  }
                ],
                "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE"
              }
            }
          ]
        }
      ],
      "schemaUrl": "https://opentelemetry.io/schemas/1.6.1"
    }
  ]
}

The target resource labels are configsync.sync.*, which they actually exists, so now it's more confusing why the resource_filter won't convert them into metric labels..

damemi commented 1 year ago

Hm yeah they're present in that output. One thing I notice is configsync.sync.kind is empty:

          {
            "key": "configsync.sync.kind",
            "value": {
              "stringValue": ""
            }
          },

So maybe there is an issue with dropping attributes with empty values. But I don't see anything that would cause that to prevent the other configsync attributes from being parsed. You're not seeing any configsync attributes, right?

I'm going to try to reproduce this, but in the mean time you could try the new regex field for resource filters. That's in v0.34.2 of this repo and will be in the next upstream collector-contrib release (v0.67 i think..?)

damemi commented 1 year ago

@tiffanny29631 I tried creating a test using the json you shared as an input, and I can't reproduce the issue. See #541, the resource_filtered_attributes_expected.json file is the output from our exporter and the configsync labels are there (example)

This is using the same collector config exporter settings as you provided, so it looks like our exporter is working.

@dashpole had the idea that we could add a debugging feature to the exporter for it to save its JSON output, which would tell us for sure if your exact exporter is working. I'll look into what kind of effort that would involve

karlkfi commented 1 year ago

Maybe I don't understand how resource labels work in cloud monitoring, but if the extra fields are specified as resource attributes, why do they get emitted as metric labels and not resource labels?

Do they need to have a specific prefix or be on an allowlist to make it into the k8s_pod type resource labels?

Are there other resource types that would allow custom labels?

damemi commented 1 year ago

Ah, I think there was a miscommunication. The resource_filters setting only parses resource attributes to metric labels. I assumed that was the bug from @tiffanny29631's original description:

We are trying to add some custom resource labels to have them converted with resource_filter into metric labels later

Re-reading it, I see now that you're asking for custom resource labels.

We don't have a way to parse custom resource attributes to resource labels, but we could probably extend the resource_filters section to have a parse_to: <monitoredResource | labels> option, or something similar to the old monitored resource mappings (which were removed in https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/pull/252)

Thanks for chiming in @karlkfi, your comment helped clear up some confusion :)

tiffanny29631 commented 1 year ago

Can I read it as 'by using resource processor we might be able to convert the custom resource attributes into resource labels' based on this statement?

Apologize for not expressing myself clear from earlier, I may be referring to the terms in a wrong way. To confirm: the ones that Config Sync emits are resource attributes, and the recognized monitored resources carries resource labels?

or something similar to the old monitored resource mappings (which were removed in https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/pull/252)

For this, does that mean we need the OTLP feature gate and revert to legacy mode?

dashpole commented 1 year ago

@tiffanny29631 What monitored resource are you trying to write to, and what are the labels on it?

tiffanny29631 commented 1 year ago

@dashpole Writing to custom.googleapis.com, the metrics are recognized as k8s_pod resource type, our custom labels are specified here, and the custom resource labels are specified in this comment. Hopefully this answers your question

dashpole commented 1 year ago

Followed up with @tiffanny29631 offline. The configsync labels will needed to be added as metric labels, since resource labels are fixed to the k8s_pod labels.

karlkfi commented 1 year ago

On a related note, we’re also sending k8s.deployment.name and (for at least one component) k8s.container.name, which is important to knowing what component the metric came from.

Is there a resource type that would include those as labels as well as what’s already in the k8s_pod resource type? Like maybe a k8s_deployment_container?

If not, does it sound like something that could be added? Or should we convert those resource attributes into metric labels too?

dashpole commented 1 year ago

The k8s_container is the same as k8s_pod, but includes a container_name resource label. If you include k8s.container.name, it should be mapped to k8s_container instead of k8s_pod in the exporter.

For the deployment name, you would have to add it as a metric label.

damemi commented 1 year ago

Thanks @dashpole, I didn't realize monitored resources had a strictly fixed set of labels.

Given that, can we close this issue? Seems like wontfix

dashpole commented 1 year ago

It is unclear to me if they were able to successfully send configsync.* labels as metric labels. Lets leave this open until they confirm

tiffanny29631 commented 1 year ago

I will be testing this soon and keep you updated.

tiffanny29631 commented 1 year ago

I'm able to use attributes processor to add and export the custom labels, but want to confirm with @dashpole about the meaning of the attributes in this context, can they be considered as metrics labels? Thanks!

dashpole commented 1 year ago

@tiffanny29631 Yes, the attributes processor adds metric labels, not resource labels.

karlkfi commented 1 year ago

Can this be clarified in the readme? It’s really confusing with references to resources and resource_attributes there.

It would also be much clearer if it were named the “label processor”, but that’s probably non-trivial.

damemi commented 1 year ago

I think it's pretty clear in the contrib docs:

resource_filters (default = []): If provided, resource attributes matching any filter will be included in metric labels.

but we can update the readme here with a note about it too and a link to that config section

karlkfi commented 1 year ago

This is the doc I was thinking of: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor

It uses “attributes” throughout, which felt ambiguous.

tiffanny29631 commented 1 year ago

I'd find it helpful if the terms like attributes / labels / tags are clarified under the Otel Collector context

dashpole commented 1 year ago

labels and tags are called attributes in OpenTelemetry: https://opentelemetry.io/docs/reference/specification/common/#attribute. "Metric labels" is a Google Cloud Monitoring concept, so it isn't mentioned outside of gcp-specific components.

The attributes processor docs seem relatively clear: "The attributes processor modifies attributes of a span, log, or metric."

damemi commented 1 year ago

Sorry, @dashpole did you mean the attributes processor adds labels, or our resource_filter adds them (in https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/issues/534#issuecomment-1349608006)? I took your comment to mean the latter

The attributes processor (and resourceprocessor) does add attributes to the incoming metrics. The problem here (if I understood it) was that setting resource_filter on our exporter converted those attributes to metric labels, which is working as intended.

dashpole commented 1 year ago

Sorry, I should've said: The attributes processor adds metric attributes, which are converted to metric labels in our exporter. The resource and resourcedetection processors add resource attributes, which can be converted to metric labels using resource_filter in our exporter.

I get that it can be confusing that both resource attributes and metric attributes are part of a batch of metrics, that both are "attributes", and then also have an "attributes" processor. Maybe a "this doesn't modify resource attributes. Use the resource processor instead" in the readme would be helpful for newcomers.

karlkfi commented 1 year ago

labels and tags are called attributes in OpenTelemetry

This was the information I was missing.

I’ve been having a hard time keeping terminology straight between the metrics components we’re using:

Is there some kind of disambiguation table that explains how the terms map across different products?

dashpole commented 1 year ago

Le table :)

Technology Term
OpenCensus tags
OpenTelemetry attributes
Prometheus labels
Google Cloud labels
tiffanny29631 commented 1 year ago

The last one is the same as Google Cloud, just different prefix

dashpole commented 1 year ago

Closing for now, let us know if you have any other questions we can ansewr