apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.15k stars 14.32k forks source link

Statsd host does not update #19044

Closed praktiskt closed 3 years ago

praktiskt commented 3 years ago

Official Helm Chart version

1.2.0 (latest released)

Apache Airflow version

2.1.4 (latest released)

Kubernetes Version

v.1.21.2

Helm Chart configuration

statsd:
  enabled: false

env:
  # Enable statsd here, disable chart default.
  - name: AIRFLOW__METRICS__STATSD_ON
    value: true
  - name: AIRFLOW__METRICS__HOST
    value: <service name for statsd>
  - name: AIRFLOW__METRICS__STATSD_PORT
    value: <port for statsd>

Docker Image customisations

None.

What happened

Setting AIRFLOW__METRICS__HOST seem to have no effect, even if we set statsd.enabled=false in the chart and enable it through the same config (AIRFLOW__METRICS__STATSD_ON). Airflow will continue to try to resolve the default host, <name>-statsd-service, even if set AIRFLOW__METRICS__HOST.

What you expected to happen

I expected Airflow to honor the AIRFLOW__METRICS__HOST variable and resolve the service as specified there.

How to reproduce

Step 1 - vanilla install

Install the helm-chart with the following values.yml

# airflow-values.yml

statsd:
  enabled: true # this is the default
helm install -f airflow-values.yml example apache-airflow/airflow --version 1.2.0 --timeout 1200s

Step 2 - Replace statsd with custom statsd

We want to disable the chart-built-in statsd and use our own instead. To accommodate, set up any statsd deployment in the same namespace. In my case, I use the prometheus exporter one but it can be absolutely anything.

# Install statsd. 
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install statsd prometheus-community/prometheus-statsd-exporter --timeout 1200s

# Chart-default is to accept udp/tcp on port 9125. Above config also creates a service for us called `statsd-prometheus-statsd-exporter`.

... And update Airflow to communicate with the new statsd

# airflow-values.yml

# Update Airflow to use the new statsd
statsd:
  enabled: false

env:
  - name: AIRFLOW__METRICS__STATSD_ON
    value: true
  - name: AIRFLOW__METRICS__HOST
    value: statsd-prometheus-statsd-exporter
  - name: AIRFLOW__METRICS__STATSD_PORT
    value: 9125
helm upgrade -f airflow-values.yml example apache-airflow/airflow --version 1.2.0 --timeout 1200s

Step 3 - Airflow fails to resolve the new host

Run any DAG from within Airflow, or restart the webserver to trigger an error, displaying that it's unable to resolve the service;

[2021-10-18 14:22:35,668] {stats.py:359} ERROR - Could not configure StatsClient: [Errno -2] Name or service not known, using DummyStatsLogger instead.

Step 4 - Rename the new statsd-service to match what Airflow is looking for

We can fix this by overriding the name of the new service to match that which we know Airflow is looking for (<name>-statsd).

# statsd-values.yml

# Remember we set the name of our helm install to `example` in step 1.
# This toggle alters the name of the service for our custom statsd to be whatever we put below.
fullnameOverride: example-statsd
helm upgrade -f statsd-values.yml statsd prometheus-community/prometheus-statsd-exporter --timeout 1200s

Airflow is now able to log metrics to our custom statsd-client.

And just to showcase that these variables are not picked up, let's set them to something that we know does not exist.

# airflow-values.yml

statsd:
  enabled: false

env:
  - name: AIRFLOW__METRICS__STATSD_ON
    value: true
  - name: AIRFLOW__METRICS__HOST
    value: whatever-you-want-really
  - name: AIRFLOW__METRICS__STATSD_PORT
    value: 3123 # anything numeric
helm upgrade -f airflow-values.yml example apache-airflow/airflow --version 1.2.0 --timeout 1200s

Airflow is still able to log metrics to our custom statsd-client.

Anything else

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 3 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

jedcunningham commented 3 years ago

Hey @magnusfurugard, you need to use AIRFLOW__METRICS__STATSD_HOST, not AIRFLOW__METRICS__HOST: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#statsd-host

I looked and didn't see any reference to the wrong env var name in our docs. Did you see it somewhere?

If you still have issues, let me know and I can reopen this. Thanks!

vaishali-cm commented 1 year ago

I used the correct variable key and followed everything from here, still statsd_host value does not change to the one in env var.