airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.94k stars 4.09k forks source link

OpenTelemetry: WARNING: Instrument has recorded multiple values for the same attributes #15623

Closed marcosmarxm closed 1 year ago

marcosmarxm commented 2 years ago

This Github issue is synchronized with Zendesk:

Ticket ID: #1847 Priority: normal Group: Community Assistance Engineer Assignee: Sajarin

Original ticket description:

Context

Hello!

We are using Airbyte 0.39.42-alpha with Docker Compose, and are setting it up to send metrics using OpenTelemetry, using information from the following documentation and threads:

According to the documentation, we have updated the Docker Compose stack to:

  • setup the airbyte-metrics-reporter service for OpenTelemetry
  • setup the airbyte-worker service for OpenTelemetry
  • setup the opentelemetry-collector service to handle OTEL gRPC calls, and expose metrics using the Prometheus exporter

Additionally, we have setup:

  • Prometheus to scrape data from opentelemetry-collector
  • Grafana to display Prometheus metrics

Issue

When the airbyte-metrics-reporter-service emits metrics using the OpenTelemetry SDK, the following warning can be seen:

airbyte-metrics-reporter  | Aug 08, 2022 3:21:29 PM io.opentelemetry.sdk.internal.ThrottlingLogger doLog
airbyte-metrics-reporter  | WARNING: Instrument oldest_running_job_age_secs has recorded multiple values for the same attributes.
airbyte-metrics-reporter  | Aug 08, 2022 3:21:29 PM io.opentelemetry.sdk.internal.ThrottlingLogger doLog
airbyte-metrics-reporter  | WARNING: Instrument num_running_jobs has recorded multiple values for the same attributes.
airbyte-metrics-reporter  | Aug 08, 2022 3:21:29 PM io.opentelemetry.sdk.internal.ThrottlingLogger doLog
airbyte-metrics-reporter  | WARNING: Instrument oldest_pending_job_age_secs has recorded multiple values for the same attributes.

When sync jobs are running, the gauges corresponding to the number of pending and running jobs do not seem to be updated accordingly, e.g. with two sync jobs running:

$ curl --silent http://localhost:8889/metrics | rg 'num_running'

# HELP airbyte_num_running_jobs number of running jobs
# TYPE airbyte_num_running_jobs gauge
airbyte_num_running_jobs{job="metrics-reporter"} 0

image

This issue seems to be limited to gauge values, as counters are correctly incremented:

image

Configuration details

Please find the (curated) configuration related to OpenTelemetry that we used for the different services:


.env

VERSION=0.39.42-alpha
PUBLISH_METRICS="true"
METRIC_CLIENT=otel
OTEL_COLLECTOR_ENDPOINT="http://otel-collector:4317"

docker-compose.yml

services:
worker:
environment:
- PUBLISH_METRICS=${PUBLISH_METRICS}
- METRIC_CLIENT=${METRIC_CLIENT}
- OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT}

airbyte-metrics-reporter:
image: airbyte/metrics-reporter:${VERSION}
logging: *default-logging
container_name: airbyte-metrics-reporter
environment:
- CONFIG_DATABASE_PASSWORD=${CONFIG_DATABASE_PASSWORD:-}
- CONFIG_DATABASE_URL=${CONFIG_DATABASE_URL:-}
- CONFIG_DATABASE_USER=${CONFIG_DATABASE_USER:-}
- CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=${CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION:-}
- CONFIG_ROOT=${CONFIG_ROOT}
- DATABASE_PASSWORD=${DATABASE_PASSWORD}
- DATABASE_URL=jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT}/${DATABASE_DB}
- DATABASE_USER=${DATABASE_USER}
- PUBLISH_METRICS=${PUBLISH_METRICS}
- METRIC_CLIENT=${METRIC_CLIENT}
- OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT}

otel-collector:
image: otel/opentelemetry-collector:0.57.2
command: ["--config=/etc/otel-collector-config.yaml"]
ports:
- "8888:8888"   # Prometheus metrics exposed by the collector
- "8889:8889"   # Prometheus exporter metrics
volumes:
- ./otel-collector/otel-collector-config.yaml:/etc/otel-collector-config.yaml

otel-collector/otel-collector-config.yaml

---
receivers:
otlp:
protocols:
grpc: {}

processors:
batch: {}

exporters:
logging: {}
prometheus:
endpoint: 0.0.0.0:8889
namespace: airbyte
send_timestamps: true
metric_expiration: 60m

extensions:
health_check:
pprof:
zpages:

service:
extensions: [health_check, pprof, zpages]
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [logging, prometheus]

Attempts
After seeing the following issue being fixed on the OTEL SDK:

I tried bumping the version of the SDK to 1.16 using Airbyte’s deps.toml and rebuilding the Docker image for airbyte-metrics-reporter:

$ git clone https://github.com/airbytehq/airbyte
$ cd airbyte
$ vim deps.toml    # set OTEL SDK version to 1.16.0
$ cd airbyte-metrics/reporter
$ ../../gradlew build

but observed the same behaviour: warning messages, gauges stuck to 0.

The following discussion may provide better insights as to why the emission of the latest value fails for Airbyte gauges:

Please let me know if you need more information to reproduce the issue, I’ll also be happy to contribute fixes :slight_smile:

Thanks,

Aurélien

[Discourse post]
marcosmarxm commented 2 years ago

Comment made from Zendesk by Sajarin on 2022-08-12 at 19:39:

Hey @virtualtam, we really appreciate this post. I made an issue relating to your question on Github, please add your thoughts and follow the discussion over there! 
marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-08-13 at 08:42:

Hi @sajarin , thanks for following up!

I’ll head over to Github to continue the discussion :+1:

For anyone facing similar behaviour with OpenTelemetry metrics collection, the corresponding issue is airbytehq/airbyte#15623 - OpenTelemetry: WARNING: Instrument has recorded multiple values for the same attributes

[Discourse post]
geneyen-chu commented 2 years ago

do we have any update of this ? it seems the number of metrics is not correct.

id13 commented 1 year ago

Any news on this issue ? We are experiencing it as well on helm chart 0.42.2