apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.15k stars 14.32k forks source link

Status of testing of Apache Airflow Helm Chart 1.7.0rc1 #26971

Closed jedcunningham closed 2 years ago

jedcunningham commented 2 years ago

We have a kind request for all the contributors to the latest Apache Airflow Helm Chart 1.7.0rc1.

Could you please help us to test the RC versions of Airflow?

Please let us know in the comment if the issue is addressed in the latest RC.

Thanks to all who contributed to the release (probably not a complete list!): @vivek-zeta @MatthieuBlais @joshuaghezzi @ephraimbuddy @mabrikan @ihorlukianov @danielhoherd @Swalloow @BobDu @moshederri @dan-vaughan @V0lantis @potiuk @dstandish @rishkarajgi @SuperQ @csp98 @Aakcht @EliMor @raphaelauv @gmsantos @jedcunningham

danielhoherd commented 2 years ago

Verified that executor=CeleryExecutor shows up in the airflow deployment by default:

$ helm list -n aftest
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
airflow aftest      1           2022-10-10 11:51:10.148779 -0400 EDT    deployed    airflow-1.7.0   2.4.1
$ k -n aftest get deployment airflow-scheduler --show-labels
NAME                READY   UP-TO-DATE   AVAILABLE   AGE    LABELS
airflow-scheduler   1/1     1            1           4m2s   app.kubernetes.io/managed-by=Helm,chart=airflow-1.7.0,component=scheduler,executor=CeleryExecutor,heritage=Helm,release=airflow,tier=airflow

Also verified with KubernetesExecutor:

$ helm upgrade -n aftest airflow . --set executor=KubernetesExecutor
...lots of output...
$ k -n aftest get deployment airflow-scheduler --show-labels
NAME                READY   UP-TO-DATE   AVAILABLE   AGE     LABELS
airflow-scheduler   1/1     1            1           7m46s   app.kubernetes.io/managed-by=Helm,chart=airflow-1.7.0,component=scheduler,executor=KubernetesExecutor,heritage=Helm,release=airflow,tier=airflow
jedcunningham commented 2 years ago

I've verified the following:

26485 - Airflow 2.4.1 by default

25561 - Celery worker liveness probe

24395 - Postgres subchart is vendored

23876 - Executor docs change

joshuaghezzi commented 2 years ago

Verified #25732 - StatsD podAnnotations

image

image

BobDu commented 2 years ago

Verified #26415 - Add default flower_url_prefix in helm chart values

ingress:
  web:
    enabled: true
    hosts:
    - name: "airflow2-dev.example.com"
    ingressClassName: "nginx"
  flower:
    enabled: true
    path: "/flower"
    hosts:
      - name: "airflow2-dev.example.com"
    ingressClassName: "nginx"

and not custom define config.celery.flower_url_prefix

# bobdudu @ BobDu in ~ 
$ kubectl -n airflow2 get cm airflow-airflow-config -o yaml              
apiVersion: v1
data:
  airflow.cfg: |-
    [celery]
    flower_url_prefix = /flower
    worker_concurrency = 16

    [celery_kubernetes_executor]
    kubernetes_queue = kubernetes

    [core]
    colored_console_log = False
    dags_folder = /opt/airflow/dags
    executor = CeleryExecutor
    load_examples = False
    remote_logging = False

    [elasticsearch]
    json_format = True
    log_id_template = {dag_id}_{task_id}_{execution_date}_{try_number}

    [elasticsearch_configs]
    max_retries = 3
    retry_timeout = True
    timeout = 30

    [kerberos]
    ccache = /var/kerberos-ccache/cache
    keytab = /etc/airflow.keytab
    principal = airflow@FOO.COM
    reinit_frequency = 3600

    [kubernetes]
    airflow_configmap = airflow-airflow-config
    airflow_local_settings_configmap = airflow-airflow-config
    multi_namespace_mode = False
    namespace = airflow2
    pod_template_file = /opt/airflow/pod_templates/pod_template_file.yaml
    worker_container_repository = 385382614844.dkr.ecr.ap-east-1.amazonaws.com/airflow2
    worker_container_tag = v0.3

    [logging]
    colored_console_log = False
    remote_logging = False

    [metrics]
    statsd_host = airflow-statsd
    statsd_on = True
    statsd_port = 9125
    statsd_prefix = airflow

    [scheduler]
    run_duration = 41460
    standalone_dag_processor = False
    statsd_host = airflow-statsd
    statsd_on = True
    statsd_port = 9125
    statsd_prefix = airflow

    [webserver]
    enable_proxy_fix = True
    rbac = True
  airflow_local_settings.py: ""
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: airflow
    meta.helm.sh/release-namespace: airflow2
  creationTimestamp: "2022-09-13T07:43:46Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    chart: airflow-1.7.0
    component: config
    heritage: Helm
    release: airflow
    tier: airflow
  name: airflow-airflow-config
  namespace: airflow2
  resourceVersion: "757685511"
  uid: 86c10cea-11c4-40d5-8ec9-4a7f8a822464

No problem.

Aakcht commented 2 years ago

Tested #25283 - works as expected

BobDu commented 2 years ago

Verified #26598 overrideMappings

statsd:
  overrideMappings:
    # === Counters ===
    - match: "(.+)\\.(.+)_start$"
      match_metric_type: counter
      name: "airflow_job_start"
      match_type: regex
      labels:
        airflow_id: "$1"
        job_name: "$2"
    - match: "(.+)\\.(.+)_end$"
      match_metric_type: counter
      name: "airflow_job_end"
      match_type: regex
      labels:
        airflow_id: "$1"
        job_name: "$2"
# bobdu @ BobDu in ~ [17:35:34] 
$ kubectl -n airflow2 get cm airflow-statsd -o yaml
apiVersion: v1
data:
  mappings.yml: |-
    mappings:
      - labels:
          airflow_id: $1
          job_name: $2
        match: (.+)\.(.+)_start$
        match_metric_type: counter
        match_type: regex
        name: airflow_job_start
      - labels:
          airflow_id: $1
          job_name: $2
        match: (.+)\.(.+)_end$
        match_metric_type: counter
        match_type: regex
        name: airflow_job_end
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: airflow
    meta.helm.sh/release-namespace: airflow2
  creationTimestamp: "2022-10-11T09:09:41Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    chart: airflow-1.7.0
    component: config
    heritage: Helm
    release: airflow
    tier: airflow
  name: airflow-statsd
  namespace: airflow2
  resourceVersion: "757709465"
  uid: 8438f60b-b7d0-46be-99dc-8415fcefb800
# bobdu @ BobDu in ~ 
$ kubectl get --raw '/api/v1/namespaces/airflow2/services/airflow-statsd:9102/proxy/metrics'
# HELP airflow_localtaskjob_end Metric autogenerated by statsd_exporter.
# TYPE airflow_localtaskjob_end counter
airflow_localtaskjob_end 65
# HELP airflow_localtaskjob_start Metric autogenerated by statsd_exporter.
# TYPE airflow_localtaskjob_start counter
airflow_localtaskjob_start 65
BobDu commented 2 years ago

some report about #24496 not a bug, but i think it may be necessary to highlight this change in the changelog. must ensure .Values.airflowVersion if use custom image. Helm chart 1.7.0 not add env AIRFLOW__CELERY__RESULT_BACKEND by default values, but if use airflow <= v2.3, worker will start failure.

airflow@airflow-worker-1:/opt/airflow$ airflow celery worker
psycopg2.OperationalError: could not translate host name "postgres" to address: Name or service not known

And this error message may not be friendly.

MatthieuBlais commented 2 years ago

Tested #26838, works as expected

jedcunningham commented 2 years ago

@BobDu, it is assumed that folks keep airflowVersion up to date with the version in their image. This has long been a thing. This is just the latest failure of this type, another off the top of my head is schedule livenessprobes between 2.0 and 2.1.

It would be nice to detect if folks forget though. Not sure how easy that'd be to do though.

mabrikan commented 2 years ago

Tested #26423 in the RC. Working as expected.

Default value for imagePullPolicy in pod_template.yaml

$ helm list
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
airflow airflow     1           2022-10-11 20:34:39.015555852 +0300 +03 deployed    airflow-1.7.0   2.4.1      
$ kubectl get cm airflow-airflow-config -oyaml | yq e '.data."pod_template_file.yaml"' - | yq e '.spec.containers[0].imagePullPolicy'
IfNotPresent

Changing it to Always

$ helm upgrade airflow --reuse-values --set=images.pod_template.pullPolicy=Always .
$ kubectl get cm airflow-airflow-config -oyaml | yq e '.data."pod_template_file.yaml"' - | yq e '.spec.containers[0].imagePullPolicy'
Always
gmsantos commented 2 years ago

all good for #24647

Partial values file:

workers:
  resources:
    requests:
      cpu: 300m
      memory: 128Mi
    limits:
      cpu: 700m
      memory: 512Mi

Resulting pod template file:

> k exec airflow-scheduler-c95484f44-hbr2g -it -- cat /usr/local/airflow/pod_templates/pod_template_file.yaml
...
      resources:
        limits:
          cpu: 700m
          memory: 512Mi
        requests:
          cpu: 300m
          memory: 128Mi
Aakcht commented 2 years ago

Also tested #24496 - looks good.

gmsantos commented 2 years ago

Looks good for #23711 too

On values file:

dagProcessor:
  enabled: true
> kgp -l component=dag-processor
NAME                                     READY   STATUS    RESTARTS   AGE
airflow-dag-processor-8655f894b4-fhgw6   1/1     Running   0          2m19s
jedcunningham commented 2 years ago

all good for #24647

(@gmsantos that PR was about worker annotations, not resources)

gmsantos commented 2 years ago

ops, sorry my bad. For #24647:

  workers:
    podAnnotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

pod template file:

> k exec airflow-cookbook-scheduler-6df58df678-b8mqz -it -- cat /usr/local/airflow/pod_templates/pod_template_file.yaml 
Defaulted container "scheduler" out of: scheduler, scheduler-log-groomer, wait-for-airflow-migrations (init)

...
---
apiVersion: v1
kind: Pod
metadata:
  name: dummy-name
...
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
csp33 commented 2 years ago

25059 working as expected.

Global setting: image

Specific setting: image

Global + specific setting: image

jedcunningham commented 2 years ago

The helm chart is being released! Thanks everyone for testing the RC 🍺