apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.39k stars 14.11k forks source link

helm install airflow in namespace get error: File "<string>", line 32, in <module> TimeoutError: There are still unapplied migrations after 60 seconds #15340

Open patsevanton opened 3 years ago

patsevanton commented 3 years ago

Apache Airflow version: master git

Kubernetes version (if you are using kubernetes) (use kubectl version):


Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.17", 

Environment:

What happened:

git clone https://github.com/apache/airflow.git
cd airflow/chart/
helm dependency update
kubectl create namespace xxxxx
werf helm install --wait --set webserver.defaultUser.password=password,ingress.enabled=true,ingress.hosts[0]=airflow.192.168.22.7.xip.io --namespace xxxxx airflow ./

Log

│ ┌ deploy/airflow-webserver po/airflow-webserver-86857b5969-sqkv6 container/wait-for-airflow-migrations logs
│ │ [2021-04-13 05:57:20,571] {<string>:35} INFO - Waiting for migrations... 60 second(s)
│ │ Traceback (most recent call last):
│ │   File "<string>", line 32, in <module>
│ │ TimeoutError: There are still unapplied migrations after 60 seconds.
│ └ deploy/airflow-webserver po/airflow-webserver-86857b5969-sqkv6 container/wait-for-airflow-migrations logs

Next line log

│ deploy/airflow-scheduler ERROR: po/airflow-scheduler-658d5d4454-r2sgl container/wait-for-airflow-migrations: CrashLoopBackOff: back-off 10s restarting failed container=wait-for-airflow-migrations             ↵
│ pod=airflow-scheduler-658d5d4454-r2sgl_sdpcc(40e85057-2aa5-4e9e-a47d-e91530038c0c)
│ 1/1 allowed errors occurred for deploy/airflow-scheduler: continue tracking

Full log https://gist.github.com/patsevanton/0edd5571cf69aa539edcdb803c288061

patsevanton commented 3 years ago

kubectl logs -n xxxxx airflow-webserver-86857b5969-sqkv6 Error from server (BadRequest): container "webserver" in pod "airflow-webserver-86857b5969-sqkv6" is waiting to start: PodInitializing

patsevanton commented 3 years ago

kubectl logs -n xxxxx airflow-postgresql-0

postgresql 05:56:01.18
postgresql 05:56:01.18 Welcome to the Bitnami postgresql container
postgresql 05:56:01.18 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 05:56:01.18 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 05:56:01.18 Send us your feedback at containers@bitnami.com
postgresql 05:56:01.19
postgresql 05:56:01.20 INFO  ==> ** Starting PostgreSQL setup **
postgresql 05:56:01.23 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 05:56:01.24 INFO  ==> Loading custom pre-init scripts...
postgresql 05:56:01.24 INFO  ==> Initializing PostgreSQL database...
postgresql 05:56:01.25 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql 05:56:01.25 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 05:56:02.32 INFO  ==> Starting PostgreSQL in background...
postgresql 05:56:02.44 INFO  ==> Changing password of postgres
postgresql 05:56:02.45 INFO  ==> Configuring replication parameters
postgresql 05:56:02.47 INFO  ==> Configuring fsync
postgresql 05:56:02.47 INFO  ==> Loading custom scripts...
postgresql 05:56:02.48 INFO  ==> Enabling remote connections
postgresql 05:56:02.48 INFO  ==> Stopping PostgreSQL...
postgresql 05:56:03.49 INFO  ==> ** PostgreSQL setup finished! **

postgresql 05:56:03.52 INFO  ==> ** Starting PostgreSQL **
2021-04-13 05:56:03.537 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-04-13 05:56:03.537 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2021-04-13 05:56:03.556 GMT [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-04-13 05:56:03.586 GMT [178] LOG:  database system was shut down at 2021-04-13 05:56:02 GMT
2021-04-13 05:56:03.596 GMT [1] LOG:  database system is ready to accept connections
2021-04-13 05:56:10.476 GMT [193] LOG:  incomplete startup packet
2021-04-13 05:56:12.106 GMT [194] LOG:  incomplete startup packet
2021-04-13 05:57:20.415 GMT [284] LOG:  incomplete startup packet
2021-04-13 05:57:22.399 GMT [286] LOG:  incomplete startup packet
2021-04-13 05:58:44.731 GMT [397] LOG:  incomplete startup packet
2021-04-13 05:58:45.741 GMT [398] LOG:  incomplete startup packet
2021-04-13 06:00:17.733 GMT [533] LOG:  incomplete startup packet
2021-04-13 06:00:18.752 GMT [534] LOG:  incomplete startup packet
2021-04-13 06:02:18.723 GMT [703] LOG:  incomplete startup packet
2021-04-13 06:02:21.740 GMT [714] LOG:  incomplete startup packet
2021-04-13 06:04:51.723 GMT [917] LOG:  incomplete startup packet
2021-04-13 06:05:01.784 GMT [933] LOG:  incomplete startup packet
2021-04-13 06:08:53.728 GMT [1248] LOG:  incomplete startup packet
2021-04-13 06:08:56.783 GMT [1256] LOG:  incomplete startup packet
2021-04-13 06:15:15.739 GMT [1773] LOG:  incomplete startup packet
2021-04-13 06:15:16.759 GMT [1780] LOG:  incomplete startup packet
patsevanton commented 3 years ago

kubectl logs -n xxxxx airflow-scheduler-658d5d4454-r2sgl error: a container name must be specified for pod airflow-scheduler-658d5d4454-r2sgl, choose one of: [scheduler scheduler-gc] or one of the init containers: [wait-for-airflow-migrations]

patsevanton commented 3 years ago

kubectl describe -n xxxxx pod airflow-scheduler-658d5d4454-r2sgl

Name:         airflow-scheduler-658d5d4454-r2sgl
Namespace:    xxxxx
Priority:     0
Node:         ubuntu1804/192.168.22.7
Start Time:   Tue, 13 Apr 2021 05:54:59 +0000
Labels:       component=scheduler
              pod-template-hash=658d5d4454
              release=airflow
              tier=airflow
Annotations:  checksum/airflow-config: d84f720b402097e58a879efc896869845ec8bae56455470bf241221b2a016f19
              checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
              checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
              checksum/metadata-secret: a954626eab69d09b0c9bfd44128c793948c18d943d9e97431903985654b350c5
              checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
              checksum/result-backend-secret: af25d110685219c9219e6a4f9b268566118a4b732de33192387a111d1f241c89
              cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status:       Pending
IP:           10.1.78.6
IPs:
  IP:           10.1.78.6
Controlled By:  ReplicaSet/airflow-scheduler-658d5d4454
Init Containers:
  wait-for-airflow-migrations:
    Container ID:  containerd://ac2a25e781647e59aa341e5e308ebbef60408d69b1a2f6b5f2d83df808718ec2
    Image:         apache/airflow:2.0.0
    Image ID:      docker.io/apache/airflow@sha256:e973fef20d3be5b6ea328d2707ac87b90f680382790d1eb027bd7766699b2409
    Port:          <none>
    Host Port:     <none>
    Args:
      python
      -c
      import airflow
      import logging
      import os
      import time

      from alembic.config import Config
      from alembic.runtime.migration import MigrationContext
      from alembic.script import ScriptDirectory

      from airflow import settings

      package_dir = os.path.abspath(os.path.dirname(airflow.__file__))
      directory = os.path.join(package_dir, 'migrations')
      config = Config(os.path.join(package_dir, 'alembic.ini'))
      config.set_main_option('script_location', directory)
      config.set_main_option('sqlalchemy.url', settings.SQL_ALCHEMY_CONN.replace('%', '%%'))
      script_ = ScriptDirectory.from_config(config)

      timeout=60

      with settings.engine.connect() as connection:
          context = MigrationContext.configure(connection)
          ticker = 0
          while True:
              source_heads = set(script_.get_heads())

              db_heads = set(context.get_current_heads())
              if source_heads == db_heads:
                  break

              if ticker >= timeout:
                  raise TimeoutError("There are still unapplied migrations after {} seconds.".format(ticker))
              ticker += 1
              time.sleep(1)
              logging.info('Waiting for migrations... %s second(s)', ticker)

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 13 Apr 2021 06:15:15 +0000
      Finished:     Tue, 13 Apr 2021 06:16:24 +0000
    Ready:          False
    Restart Count:  7
    Environment:
      AIRFLOW__CORE__FERNET_KEY:        <set to the key 'fernet-key' in secret 'airflow-fernet-key'>        Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>  Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:          <set to the key 'connection' in secret 'airflow-airflow-metadata'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from airflow-scheduler-token-q6zfr (ro)
Containers:
  scheduler:
    Container ID:
    Image:         apache/airflow:2.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      -c
      exec airflow scheduler
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Liveness:       exec [python -Wignore -c import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'

from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys

with create_session() as session:
    job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
        SchedulerJob.latest_heartbeat.desc()).limit(1).first()

sys.exit(0 if job.is_alive() else 1)
] delay=10s timeout=5s period=30s #success=1 #failure=10
    Environment:
      AIRFLOW__CORE__FERNET_KEY:        <set to the key 'fernet-key' in secret 'airflow-fernet-key'>        Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>  Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:          <set to the key 'connection' in secret 'airflow-airflow-metadata'>  Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/logs from logs (rw)
      /opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from airflow-scheduler-token-q6zfr (ro)
  scheduler-gc:
    Container ID:
    Image:         apache/airflow:2.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      /clean-logs
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/airflow/logs from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from airflow-scheduler-token-q6zfr (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
  logs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  airflow-scheduler-token-q6zfr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  airflow-scheduler-token-q6zfr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  25m                   default-scheduler  Successfully assigned xxxxx/airflow-scheduler-658d5d4454-r2sgl to ubuntu1804
  Normal   Pulling    24m                   kubelet            Pulling image "apache/airflow:2.0.0"
  Normal   Pulled     24m                   kubelet            Successfully pulled image "apache/airflow:2.0.0"
  Normal   Created    17m (x5 over 24m)     kubelet            Created container wait-for-airflow-migrations
  Normal   Started    17m (x5 over 24m)     kubelet            Started container wait-for-airflow-migrations
  Normal   Pulled     17m (x4 over 22m)     kubelet            Container image "apache/airflow:2.0.0" already present on machine
  Warning  BackOff    4m58s (x50 over 21m)  kubelet            Back-off restarting failed container
patsevanton commented 3 years ago

Same issue with namespace airflow

git clone https://github.com/apache/airflow.git
cd airflow/chart/
helm dependency update
kubectl create namespace airflow
werf helm install --wait --set webserver.defaultUser.password=password,ingress.enabled=true,ingress.hosts[0]=airflow.192.168.22.7.xip.io --namespace airflow airflow ./
patsevanton commented 3 years ago

@mik-laj @kaxil Please view issue Big thanks!

patsevanton commented 3 years ago

How set debug in wait-for-airflow-migrations ? Thanks!

kaxil commented 3 years ago

Check the logs of run-airflow-migrations container in {{ .Release.Name }}-run-airflow-migrations

wait-for-airflow-migrations just waits for the migration to be run.

https://github.com/apache/airflow/blob/6e31465a30dfd17e2e1409a81600b2e83c910036/chart/templates/migrate-database-job.yaml

patsevanton commented 3 years ago

kubectl logs -n airflow airflow-scheduler-78d9ffb5ff-5lw8f wait-for-airflow-migrations

DB_BACKEND=postgresql
DB_HOST=airflow-postgresql.airflow.svc.cluster.local
DB_PORT=5432

[2021-04-14 04:07:12,861] {migration.py:155} INFO - Context impl PostgresqlImpl.
[2021-04-14 04:07:12,862] {migration.py:162} INFO - Will assume transactional DDL.
[2021-04-14 04:07:18,838] {opentelemetry_tracing.py:29} INFO - This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-04-14 04:07:20,086] {<string>:35} INFO - Waiting for migrations... 1 second(s)
[2021-04-14 04:07:21,090] {<string>:35} INFO - Waiting for migrations... 2 second(s)
[2021-04-14 04:07:22,093] {<string>:35} INFO - Waiting for migrations... 3 second(s)
[2021-04-14 04:07:23,095] {<string>:35} INFO - Waiting for migrations... 4 second(s)
[2021-04-14 04:07:24,097] {<string>:35} INFO - Waiting for migrations... 5 second(s)
[2021-04-14 04:07:25,100] {<string>:35} INFO - Waiting for migrations... 6 second(s)
[2021-04-14 04:07:26,102] {<string>:35} INFO - Waiting for migrations... 7 second(s)
[2021-04-14 04:07:27,104] {<string>:35} INFO - Waiting for migrations... 8 second(s)
[2021-04-14 04:07:28,107] {<string>:35} INFO - Waiting for migrations... 9 second(s)
[2021-04-14 04:07:29,109] {<string>:35} INFO - Waiting for migrations... 10 second(s)
[2021-04-14 04:07:30,111] {<string>:35} INFO - Waiting for migrations... 11 second(s)
[2021-04-14 04:07:31,114] {<string>:35} INFO - Waiting for migrations... 12 second(s)
[2021-04-14 04:07:32,116] {<string>:35} INFO - Waiting for migrations... 13 second(s)
[2021-04-14 04:07:33,118] {<string>:35} INFO - Waiting for migrations... 14 second(s)
[2021-04-14 04:07:34,121] {<string>:35} INFO - Waiting for migrations... 15 second(s)
[2021-04-14 04:07:35,124] {<string>:35} INFO - Waiting for migrations... 16 second(s)
[2021-04-14 04:07:36,126] {<string>:35} INFO - Waiting for migrations... 17 second(s)
[2021-04-14 04:07:37,129] {<string>:35} INFO - Waiting for migrations... 18 second(s)
[2021-04-14 04:07:38,131] {<string>:35} INFO - Waiting for migrations... 19 second(s)
[2021-04-14 04:07:39,134] {<string>:35} INFO - Waiting for migrations... 20 second(s)
[2021-04-14 04:07:40,136] {<string>:35} INFO - Waiting for migrations... 21 second(s)
[2021-04-14 04:07:41,139] {<string>:35} INFO - Waiting for migrations... 22 second(s)
[2021-04-14 04:07:42,141] {<string>:35} INFO - Waiting for migrations... 23 second(s)
[2021-04-14 04:07:43,143] {<string>:35} INFO - Waiting for migrations... 24 second(s)
[2021-04-14 04:07:44,145] {<string>:35} INFO - Waiting for migrations... 25 second(s)
[2021-04-14 04:07:45,148] {<string>:35} INFO - Waiting for migrations... 26 second(s)
[2021-04-14 04:07:46,150] {<string>:35} INFO - Waiting for migrations... 27 second(s)
[2021-04-14 04:07:47,152] {<string>:35} INFO - Waiting for migrations... 28 second(s)
[2021-04-14 04:07:48,154] {<string>:35} INFO - Waiting for migrations... 29 second(s)
[2021-04-14 04:07:49,157] {<string>:35} INFO - Waiting for migrations... 30 second(s)
[2021-04-14 04:07:50,159] {<string>:35} INFO - Waiting for migrations... 31 second(s)
[2021-04-14 04:07:51,161] {<string>:35} INFO - Waiting for migrations... 32 second(s)
[2021-04-14 04:07:52,162] {<string>:35} INFO - Waiting for migrations... 33 second(s)
[2021-04-14 04:07:53,164] {<string>:35} INFO - Waiting for migrations... 34 second(s)
[2021-04-14 04:07:54,166] {<string>:35} INFO - Waiting for migrations... 35 second(s)
[2021-04-14 04:07:55,168] {<string>:35} INFO - Waiting for migrations... 36 second(s)
[2021-04-14 04:07:56,170] {<string>:35} INFO - Waiting for migrations... 37 second(s)
[2021-04-14 04:07:57,172] {<string>:35} INFO - Waiting for migrations... 38 second(s)
[2021-04-14 04:07:58,175] {<string>:35} INFO - Waiting for migrations... 39 second(s)
[2021-04-14 04:07:59,177] {<string>:35} INFO - Waiting for migrations... 40 second(s)
[2021-04-14 04:08:00,180] {<string>:35} INFO - Waiting for migrations... 41 second(s)
[2021-04-14 04:08:01,182] {<string>:35} INFO - Waiting for migrations... 42 second(s)
[2021-04-14 04:08:02,185] {<string>:35} INFO - Waiting for migrations... 43 second(s)
[2021-04-14 04:08:03,187] {<string>:35} INFO - Waiting for migrations... 44 second(s)
[2021-04-14 04:08:04,189] {<string>:35} INFO - Waiting for migrations... 45 second(s)
[2021-04-14 04:08:05,192] {<string>:35} INFO - Waiting for migrations... 46 second(s)
[2021-04-14 04:08:06,194] {<string>:35} INFO - Waiting for migrations... 47 second(s)
[2021-04-14 04:08:07,196] {<string>:35} INFO - Waiting for migrations... 48 second(s)
[2021-04-14 04:08:08,199] {<string>:35} INFO - Waiting for migrations... 49 second(s)
[2021-04-14 04:08:09,201] {<string>:35} INFO - Waiting for migrations... 50 second(s)
[2021-04-14 04:08:10,203] {<string>:35} INFO - Waiting for migrations... 51 second(s)
[2021-04-14 04:08:11,206] {<string>:35} INFO - Waiting for migrations... 52 second(s)
[2021-04-14 04:08:12,208] {<string>:35} INFO - Waiting for migrations... 53 second(s)
[2021-04-14 04:08:13,211] {<string>:35} INFO - Waiting for migrations... 54 second(s)
[2021-04-14 04:08:14,212] {<string>:35} INFO - Waiting for migrations... 55 second(s)
[2021-04-14 04:08:15,215] {<string>:35} INFO - Waiting for migrations... 56 second(s)
[2021-04-14 04:08:16,217] {<string>:35} INFO - Waiting for migrations... 57 second(s)
[2021-04-14 04:08:17,219] {<string>:35} INFO - Waiting for migrations... 58 second(s)
[2021-04-14 04:08:18,222] {<string>:35} INFO - Waiting for migrations... 59 second(s)
[2021-04-14 04:08:19,224] {<string>:35} INFO - Waiting for migrations... 60 second(s)
Traceback (most recent call last):
  File "<string>", line 32, in <module>
TimeoutError: There are still unapplied migrations after 60 seconds.
gen16k commented 3 years ago

I faced this problem some days ago. However, I tried install airflow using scripts shown below today and it seems working .

#!/bin/bash -x
rm -rf airflow
git clone https://github.com/apache/airflow.git
cd airflow/chart
helm dependency update
helm install airflow . -n airflow
kaxil commented 3 years ago

@patsevanton I asked for logs from different container in https://github.com/apache/airflow/issues/15340#issuecomment-818861093 😄 (names are a bit confusing)

Check the logs of run-airflow-migrations (not wait-for-migration) container in {{ .Release.Name }}-run-airflow-migrations

patsevanton commented 3 years ago

Now i cannot reproduce. Later please

patsevanton commented 3 years ago

@kaxil

kubectl logs -n xxxxx airflow-scheduler-0  run-airflow-migrations
error: container run-airflow-migrations is not valid for pod airflow-scheduler-0

kubectl logs -n xxxxx run-airflow-migrations
Error from server (NotFound): pods "run-airflow-migrations" not found
patsevanton commented 3 years ago

@kaxil

git clone https://github.com/apache/airflow.git
cd airflow/chart/
helm dependency update
kubectl create namespace apatsev
werf helm install --wait --set webserver.defaultUser.password=password,ingress.enabled=true,ingress.hosts[0]=airflow.192.168.22.8.sslip.io --namespace apatsev airflow ./

pod not found

kubectl logs -n apatsev airflow-run-airflow-migrations
Error from server (NotFound): pods "airflow-run-airflow-migrations" not found
kubectl logs -n apatsev airflow-run-airflow-migrations run-airflow-migrations
Error from server (NotFound): pods "airflow-run-airflow-migrations" not found

https://github.com/apache/airflow/blob/6e31465a30dfd17e2e1409a81600b2e83c910036/chart/templates/migrate-database-job.yaml#L27 is kind of Job.

I dont have job

kubectl get all -A | grep Job
kubectl get all -A | grep job
LiboShen commented 3 years ago

FYI, have you tried set "wait" false? I found this works for me: https://forum.astronomer.io/t/run-airflow-migration-and-wait-for-airflow-migrations/1189/10

patsevanton commented 3 years ago

@LiboShen How add wait false to install I install airflow:

git clone https://github.com/apache/airflow.git
cd airflow/chart/
helm dependency update
kubectl create namespace apatsev
werf helm install --wait --set webserver.defaultUser.password=password,ingress.enabled=true,ingress.hosts[0]=airflow.192.168.22.8.sslip.io --namespace apatsev airflow ./

Create file or add option?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] commented 3 years ago

This issue has been closed because it has not received response from the issue author.

autarchprinceps commented 3 years ago

I'm facing the same issue. I don't ever get any pod or job containing run-airflow-migrations, and consequently the wait never ends. Is there a solution for this? I'm not running terraform and neither is patsevanto. He is using werf, I'm using flux.

roestzwiee commented 3 years ago

tldr;

@autarchprinceps I'm having the same trouble here. When deploying the chart to our local machines all runs fine. But deploying the chart to our cluster using helm does not create the run-airflow-migrations and create-airflow-user job.

You have to set the --wait flag with helm!

dsykes16 commented 2 years ago

I also ran into this issue. In the interest of saving time for anyone else that stumbles upon this issue, the fix seems to be setting --wait=false on the Helm command, per @LiboShen's advice.

On Rancher you can un-check "Wait" on the final page before deploying. I'm sure OpenShift and other solutions have similar options for exposing the underlying Helm --wait flag.

I can confirm that this worked on Rancher v2.6.1 installing Airflow to a downstream cluster provisioned by RKE running Kubernetes v1.21.5.

yehoshuadimarsky commented 2 years ago

I'm using ArgoCD to deploy the Helm chart, tearing out my hair trying every possible variation, but I'm also not seeing the run-airflow-migrations pod, it doesn't run or show up. So my webserver and scheduler wait forever for the migrations that never started.

Not sure how to set the --wait=false param using Argo. I tried argocd app set [my-app] --helm-set-string wait=false but doesn't seem to do anything.

So I'm stuck.

JyotiSnK commented 2 years ago

@yehoshuadimarsky Did you find the way to do this? I am also trying to implement the exact same thing and have been struggling to get the issue fixed.

yehoshuadimarsky commented 2 years ago

@yehoshuadimarsky Did you find the way to do this? I am also trying to implement the exact same thing and have been struggling to get the issue fixed.

Yes! I finally got this to work: put this in your values.yaml override:

 ​        # per https://github.com/apache/airflow/pull/16291 
 ​        # and https://github.com/apache/airflow/pull/16331 
 ​        createUserJob: 
 ​          jobAnnotations: 
 ​            "argocd.argoproj.io/hook": Sync 
 ​            "argocd.argoproj.io/sync-wave": "0" 
 ​            "argocd.argoproj.io/hook-delete-policy": BeforeHookCreation,HookSucceeded 
 ​        migrateDatabaseJob: 
 ​          jobAnnotations: 
 ​            "argocd.argoproj.io/hook": Sync 
 ​            "argocd.argoproj.io/sync-wave": "0" 
 ​            "argocd.argoproj.io/hook-delete-policy": BeforeHookCreation,HookSucceeded
JyotiSnK commented 2 years ago

Thanks! I tried adding this annotation in my values.yaml file, but for some reason does not seem to work. When I add this annotation, my application fails with validation error and values.yaml file does not even get loaded in the ArgoCD UI. And it shows the line no. as error where I am adding the annotation. May be I am missing something, not sure. Is there any specific version of charts/argo-cd I am suppose to use to get this working?

2021-11-23_17-35-15
paul-bormans commented 2 years ago

I'm seeing the same issue when deploying using helm to k8s.

Using versions:

It seems that these hooks were recently made configurable: https://github.com/apache/airflow/blob/main/chart/values.yaml#L632

I just ran a quick test and indeed the job is now scheduled/run and after completion the scheduler/webserver pod are spinning.

I don't fully understand why/how it was build this way...?

Paul

yehoshuadimarsky commented 2 years ago

I'm seeing the same issue when deploying using helm to k8s.

Using versions:

  • chart: 1.3.0
  • app: 2.2.1

It seems that these hooks were recently made configurable: https://github.com/apache/airflow/blob/main/chart/values.yaml#L632

I just ran a quick test and indeed the job is now scheduled/run and after completion the scheduler/webserver pod are spinning.

I don't fully understand why/how it was build this way...?

Paul

Cool, I didn't know this was added recently, this should solve the problem really nicely

yehoshuadimarsky commented 2 years ago

https://github.com/apache/airflow/blob/main/chart/values.yaml#L632

Make sure you put it in the correct parts of the YAML file. I was referring to the jobAnnotations of each of the migrate Jobs, such as here

https://github.com/apache/airflow/blob/cab6d96a463e227961b8e487dd000199f4864978/chart/values.yaml#L614

and here

https://github.com/apache/airflow/blob/cab6d96a463e227961b8e487dd000199f4864978/chart/values.yaml#L647

kreuzert commented 2 years ago

Workaround I've used:

  1. set airflow.dbMigrations.runAsJob: True in your values.yaml file ( https://github.com/airflow-helm/charts/blob/e1d49498426add959350ab8efacead8f96400759/charts/airflow/templates/db-migrations/db-migrations-job.yaml#L1 )
  2. disable Helm wait option ( https://github.com/airflow-helm/charts/blob/e1d49498426add959350ab8efacead8f96400759/charts/airflow/values.yaml#L340 )
bitsofinfo commented 2 years ago

seeing same issue

bitsofinfo commented 2 years ago

confirmed this workaround works (im using terraform to run the chart)

  wait = "false"
  set {
    name  = "airflow.dbMigrations.runAsJob"
    value = "true"
  }
bartcode commented 2 years ago

Anyone using ArgoCD to deploy the Airflow Helm chart who reaches this issue: read this piece of documentation.

When installing the chart using ArgoCD, you MUST set the two following values, or your application will not start as the migrations will not be run:

createUserJob.useHelmHooks: false
migrateDatabaseJob.useHelmHooks: false
villasv commented 2 years ago

In my case, just removing the --wait flag was enough, I didn't have to fiddle with airflow.dbMigrations.runAsJob

martimors commented 2 years ago

I came to this issue upgrading to airflow 2.3.0 via helm chart 1.6.0. The issue turned out to be the migration job not being able to schedule due to a too low CPU request limit on the k8s namespace. Just another thing to check if you end up here like I did.

WongyuChoi commented 2 years ago

@dingobar Thank you for the input. I am using the same versions and having (maybe similar) migration issue. May I ask you to elaborate on the solution?

martimors commented 2 years ago

@dingobar Thank you for the input. I am using the same versions and having (maybe similar) migration issue. May I ask you to elaborate on the solution?

Run kubectl -n <namespace> get events and see if there are events where some things are not being scheduled. If not then also try a kubectl -n <namespace> describe replicaset <migration job replicaset> and see if there are events there that can give you a clue. In my case, the scheduler could not start the migration pod due to a limit in how much CPU the namespace could request in total. You can see the limits by describing the namespace, kubectl describe namespace <namespace>. Hope that helps.

Jithsaavvy commented 1 year ago

@dingobar - I'm also facing your same resource limit issues during my Airflow Helm deployment in K8s cluster. It looks like the used resources range is higher than the resource limit in my namespace. May I ask you for a solution, if you have fixed it?