apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.13k stars 14.31k forks source link

Quickstart Helm Chart fails post-install #16176

Closed kasteph closed 3 years ago

kasteph commented 3 years ago

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.19

Environment:

What happened:

Helm chart does not successfully deploy to a kind cluster despite following the Quick Start. Repeatedly tried multiple times and the flower, postgres, redis and statsd services run fine but it fails at the run-airflow-migrations service with a CrashLoopBackoff:

  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m19s                  default-scheduler  Successfully assigned airflow/airflow-run-airflow-migrations-c9pph to kind-control-plane
  Normal   Pulled     2m43s (x5 over 5m17s)  kubelet            Container image "apache/airflow:2.0.2" already present on machine
  Normal   Created    2m43s (x5 over 5m17s)  kubelet            Created container run-airflow-migrations
  Normal   Started    2m43s (x5 over 5m17s)  kubelet            Started container run-airflow-migrations
  Warning  BackOff    9s (x18 over 4m25s)    kubelet            Back-off restarting failed container

What you expected to happen:

Successful Helm deployment.

How to reproduce it:

  1. Created a kind cluster: kind create cluster --image kindest/node:v1.18.15
  2. Added Helm chart repo: helm repo add apache-airflow https://airflow.apache.org
  3. Created kube namespace: kubectl create namespace airflow
  4. Installed chart: helm install airflow apache-airflow/airflow --namespace airflow --debug
install.go:173: [debug] Original chart version: ""
install.go:190: [debug] CHART PATH: /Users/stephaniesamson/Library/Caches/helm/repository/airflow-1.0.0.tgz

client.go:282: [debug] Starting delete for "airflow-broker-url" Secret
client.go:122: [debug] creating 1 resource(s)
client.go:282: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:122: [debug] creating 1 resource(s)
client.go:282: [debug] Starting delete for "airflow-redis-password" Secret
client.go:122: [debug] creating 1 resource(s)
client.go:122: [debug] creating 30 resource(s)
client.go:282: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:122: [debug] creating 1 resource(s)
client.go:491: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:519: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:558: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:519: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:558: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed post-install: timed out waiting for the condition
helm.go:81: [debug] failed post-install: timed out waiting for the condition
boring-cyborg[bot] commented 3 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

kaxil commented 3 years ago

Can you provide logs of airflow-run-airflow-migrations job please

Dr-Denzy commented 3 years ago

I will try to see if I can reproduce this issue.

Dr-Denzy commented 3 years ago

I could not reproduce this issue. Consider closing it @kaxil

...

helm install $RELEASE_NAME apache-airflow/airflow --namespace $NAMESPACE --debug
...

NOTES:
Thank you for installing Apache Airflow 2.0.2!

Your release is named airflow-release.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:

Airflow Webserver:     kubectl port-forward svc/airflow-release-webserver 8080:8080 --namespace airflow-namespace
Flower dashboard:      kubectl port-forward svc/airflow-release-flower 5555:5555 --namespace airflow-namespace
Default Webserver (Airflow UI) Login credentials:
    username: admin
    password: admin
Default Postgres connection credentials:
    username: postgres
    password: postgres
    port: 5432

You can get Fernet Key value by running the following:

    echo Fernet Key: $(kubectl get secret --namespace airflow-namespace airflow-release-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)
kasteph commented 3 years ago

Can you provide logs of airflow-run-airflow-migrations job please

❯ kubectl logs -n airflow airflow-run-airflow-migrations-hw9lz
BACKEND=postgresql
DB_HOST=airflow-postgresql.airflow
DB_PORT=5432

DB: postgresql://postgres:***@airflow-postgresql.airflow:5432/postgres?sslmode=disable
[2021-05-31 19:39:05,756] {db.py:684} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
WARNI [airflow.providers_manager] Exception when importing 'airflow.providers.google.common.hooks.leveldb.LevelDBHook' from 'apache-airflow-providers-google' package: No module named 'airflow.providers.google.common.hooks.leveldb'
WARNI [airflow.providers_manager] Exception when importing 'airflow.providers.google.common.hooks.leveldb.LevelDBHook' from 'apache-airflow-providers-google' package: No module named 'airflow.providers.google.common.hooks.leveldb'
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 171, in _catch_revision_errors
    yield
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 365, in _upgrade_revs
    revs = list(revs)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 904, in _iterate_revisions
    requested_lowers = self.get_revisions(lower)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 455, in get_revisions
    return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 455, in <listcomp>
    return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 460, in get_revisions
    for rev_id in resolved_id
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 460, in <genexpr>
    for rev_id in resolved_id
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/revision.py", line 536, in _revision_for_ident
    resolved_id,
alembic.script.revision.ResolutionError: No such revision or branch 'a13f7613ad25'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/db_command.py", line 48, in upgradedb
    db.upgradedb()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/db.py", line 694, in upgradedb
    command.upgrade(config, 'heads')
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/command.py", line 294, in upgrade
    script.run_env()
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 490, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
    module = load_module_py(module_id, path)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/util/compat.py", line 182, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/migrations/env.py", line 108, in <module>
    run_migrations_online()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/migrations/env.py", line 102, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/runtime/migration.py", line 548, in run_migrations
    for step in self._migrations_fn(heads, self):
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/command.py", line 283, in upgrade
    return script._upgrade_revs(revision, rev)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 370, in _upgrade_revs
    for script in reversed(list(revs))
  File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/script/base.py", line 203, in _catch_revision_errors
    compat.raise_(util.CommandError(resolution), from_=re)
  File "/home/airflow/.local/lib/python3.6/site-packages/alembic/util/compat.py", line 294, in raise_
    raise exception
alembic.util.exc.CommandError: Can't locate revision identified by 'a13f7613ad25'
ephraimbuddy commented 3 years ago

@stephsamson can you delete the namespace and recreate it. Then run helm repo update before install?

kasteph commented 3 years ago

@ephraimbuddy thanks that worked!

niklasden commented 3 years ago

Hi everyone,

I have run into the same issue on a fresh microk8s cluster. After running the following command: microk8s.helm3 install airflow apache-airflow/airflow --namespace airflow --wait=false Error: failed post-install: timed out waiting for the condition

I have tried deleting the namespace and updating the repo several times.

Anyone running into the same issues?

ralleman-quasarsat commented 3 years ago

I have not been able to get airflow installed. I've tried several times, deleting the cluster on each attempt. I'm following these instructions https://marclamberti.com/blog/airflow-on-kubernetes-get-started-in-10-mins/. I'm working on a 2021 M1 Mac Air under Big Sur.

% helm install airflow apache-airflow/airflow --namespace airflow --debug
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: .../Library/Caches/helm/repository/airflow-1.2.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:328: [debug] secrets "airflow-broker-url" not found
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:328: [debug] secrets "airflow-fernet-key" not found
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:328: [debug] secrets "airflow-redis-password" not found
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 31 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:328: [debug] jobs.batch "airflow-run-airflow-migrations" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
    helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.2.1/command.go:902
main.main
    helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
    runtime/proc.go:225
runtime.goexit
    runtime/asm_arm64.s:1130
ralleman-quasarsat commented 3 years ago

On another attempt, this gets added to the output:

client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
W1101 15:44:58.592580   29841 reflector.go:441] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: watch of *unstructured.Unstructured ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I1101 15:45:09.716362   29841 trace.go:205] Trace[2052545262]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (01-Nov-2021 15:44:59.714) (total time: 10001ms):
Trace[2052545262]: [10.001633833s] [10.001633833s] END
E1101 15:45:09.716411   29841 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:62689/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-run-airflow-migrations&resourceVersion=930": net/http: TLS handshake timeout
I1101 15:45:22.725536   29841 trace.go:205] Trace[904910366]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (01-Nov-2021 15:45:12.724) (total time: 10001ms):
Trace[904910366]: [10.001626666s] [10.001626666s] END
E1101 15:45:22.725569   29841 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:62689/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-run-airflow-migrations&resourceVersion=930": net/http: TLS handshake timeout
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
zambien commented 3 years ago

@kaxil @Dr-Denzy please re-open this issue as multiple people are reporting it. I am able to recreate it intermittently myself. You can follow the notes here:

https://github.com/zambien/tf-eks-airflow/blob/tf_eks_extended/notes.md

deploy airflow on k8s using helm without packaged db

To keep everything simple we use the default namespace Create the cluster and set it in kubectl

kind create cluster --name airflow --config terraform/kind/kind-config.yaml
kubectl cluster-info --context kind-airflow

Get your charts

helm repo add apache-airflow https://airflow.apache.org
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Run postgres

helm install db \
  --set postgresqlPassword=secretpassword,postgresqlDatabase=airflow \
    bitnami/postgresql

Run airflow without the included db:

helm install airflow apache-airflow/airflow --debug \
  -f terraform/kind/airflow-values.yaml \
  --set 'env[0].name=AIRFLOW__CORE__LOAD_EXAMPLES,env[0].value=True'

Sometimes this works, other times it does not. It seems that the catalyst may be the separate database.

Here is the issue I see:

helm install --debug airflow apache-airflow/airflow \                       ✔  5624  06:57:56
  -f terraform/kind/airflow-values.yaml \
  --set 'env[0].name=AIRFLOW__CORE__LOAD_EXAMPLES,env[0].value=True'
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: /home/adam/.cache/helm/repository/airflow-1.2.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 27 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
    helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.2.1/command.go:902
main.main
    helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
    runtime/proc.go:225
runtime.goexit
    runtime/asm_amd64.s:1371
kaxil commented 3 years ago

@zambien https://github.com/apache/airflow/pull/18776 should allow a disabling a Helm Hooks which might fix issue for you.

Can you try it out on your local machine or dev cluster by running the following commands:

helm repo add apache-airflow-dev https://dist.apache.org/repos/dist/dev/airflow/helm-chart/1.3.0rc1/
helm repo update
helm install airflow apache-airflow-dev/airflow

1.3.0rc1 is the release candidate for 1.3.0 release

matasejem commented 3 years ago

@kaxil, i am receiving similar err (see below) - wonder if i should also try the above commands relating to 1.3.0rc1, or whether it requires different kind of fix - thanks.

PS C:\Windows\System32> helm install airflow apache-airflow/airflow --namespace airflow --debug --timeout 10m0s
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: C:\Users\MARTIN~1\AppData\Local\Temp\helm\repository\airflow-1.3.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 33 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 10m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:299: [debug] Starting delete for "airflow-create-user" Job
client.go:328: [debug] jobs.batch "airflow-create-user" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-create-user with timeout of 10m0s
client.go:556: [debug] Add/Modify event for airflow-create-user: ADDED
client.go:595: [debug] airflow-create-user: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for airflow-create-user: MODIFIED
client.go:595: [debug] airflow-create-user: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
W1115 12:55:22.480047    9876 reflector.go:441] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: watch of *unstructured.Unstructured ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I1115 12:55:33.664716    9876 trace.go:205] Trace[300875778]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:55:23.654) (total time: 10009ms):
Trace[300875778]: [10.0095411s] [10.0095411s] END
E1115 12:55:33.665699    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:55:45.927127    9876 trace.go:205] Trace[922864028]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:55:35.908) (total time: 10018ms):
Trace[922864028]: [10.0183913s] [10.0183913s] END
E1115 12:55:45.927382    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:56:00.518389    9876 trace.go:205] Trace[1282502707]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:55:50.511) (total time: 10007ms):
Trace[1282502707]: [10.007077s] [10.007077s] END
E1115 12:56:00.521483    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:56:23.254867    9876 trace.go:205] Trace[336697707]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:56:13.237) (total time: 10017ms):
Trace[336697707]: [10.0173028s] [10.0173028s] END
E1115 12:56:23.255431    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:56:50.279920    9876 trace.go:205] Trace[1113683026]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:56:40.266) (total time: 10013ms):
Trace[1113683026]: [10.013341s] [10.013341s] END
E1115 12:56:50.281131    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:57:28.631461    9876 trace.go:205] Trace[2006327411]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:57:18.629) (total time: 10002ms):
Trace[2006327411]: [10.002029s] [10.002029s] END
E1115 12:57:28.631461    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:58:22.706208    9876 trace.go:205] Trace[365191476]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:58:12.695) (total time: 10010ms):
Trace[365191476]: [10.0109451s] [10.0109451s] END
E1115 12:58:22.706762    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:59:16.350704    9876 trace.go:205] Trace[611706561]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:59:06.341) (total time: 10009ms):
Trace[611706561]: [10.0090197s] [10.0090197s] END
E1115 12:59:16.351089    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 12:59:59.275048    9876 trace.go:205] Trace[1510419015]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 12:59:49.268) (total time: 10006ms):
Trace[1510419015]: [10.0061847s] [10.0061847s] END
E1115 12:59:59.275556    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 13:00:49.912921    9876 trace.go:205] Trace[434755033]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 13:00:39.902) (total time: 10010ms):
Trace[434755033]: [10.0104631s] [10.0104631s] END
E1115 13:00:49.913463    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 13:01:49.592632    9876 trace.go:205] Trace[958360155]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 13:01:39.576) (total time: 10016ms):
Trace[958360155]: [10.0162521s] [10.0162521s] END
E1115 13:01:49.593159    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
I1115 13:02:30.179570    9876 trace.go:205] Trace[412353598]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167 (15-Nov-2021 13:02:20.167) (total time: 10012ms):
Trace[412353598]: [10.0121795s] [10.0121795s] END
E1115 13:02:30.180122    9876 reflector.go:138] k8s.io/client-go@v0.22.1/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://127.0.0.1:56936/apis/batch/v1/namespaces/airflow/jobs?fieldSelector=metadata.name%3Dairflow-create-user&resourceVersion=5376": net/http: TLS handshake timeout
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
        helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.2.1/command.go:902
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
        runtime/proc.go:225
runtime.goexit
        runtime/asm_amd64.s:1371
PS C:\Windows\System32>
pplanel commented 2 years ago

Up, facing the same problem.

rdeteix commented 2 years ago

up, I have the same issue

kyp0717 commented 2 years ago

I have the same issue.

chyumin commented 2 years ago

I got the same issue, but installing older version worked for me helm install airflow apache-airflow/airflow --namespace airflow --version 1.0.0 Probably something is wrong with the current latest chart version

fm-falken commented 2 years ago

Confirming this. helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f .\airflow\values.yaml --debug Returns:

history.go:56: [debug] getting history for release airflow
Release "airflow" does not exist. Installing it now.
install.go:178: [debug] Original chart version: ""
install.go:199: [debug] CHART PATH: C:\Users\admin\AppData\Local\Temp\helm\repository\airflow-1.3.0.tgz

client.go:299: [debug] Starting delete for "airflow-broker-url" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:299: [debug] Starting delete for "airflow-redis-password" Secret
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 37 resource(s)
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:328: [debug] jobs.batch "airflow-run-airflow-migrations" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 5m0s
client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed post-install: timed out waiting for the condition
helm.go:88: [debug] failed post-install: timed out waiting for the condition
potiuk commented 2 years ago

Can someone please open a new issue with all the details please? This is a closed issue. Likely different reason. Commenting on a closed issue from May will not resurrect it. Even if symptoms might be similiar, it is likely a different issue.

potiuk commented 2 years ago

We really need more details - values, configurtions, detaile logs from the "wait-for-migrations" jobs etc

rdeteix commented 2 years ago

Hi I tried it with an increased server configuration and it worked. It may be a memory/cpu issue.

kyp0717 commented 2 years ago

Perhaps someone can install with "--timeout 10m0s" option. It worked for me when I use the official apache helm chart.

veromos commented 2 years ago

I'm facing the same issue, even with timeout option it's not working.

potiuk commented 2 years ago

I'm facing the same issue, even with timeout option it's not working.

Please open a detailed issue about this with more details (logs and describing what you experience). It might likely be a differetn issues

The comment "I have the same issue" on a closed issue does not help in any meaningful way in diagnosing the issue.

HesamKorki commented 2 years ago

Up, I am facing the same issue. Could not find a workaround. I used 10m timeout but it did not work

potiuk commented 2 years ago

Up, I am facing the same issue. Could not find a workaround. I used 10m timeout but it did not work

Please open a detailed issue about this with more details (logs and describing what you experience). It might likely be a different issue.

The comment "I have the same issue" on a closed issue does not help in any meaningful way in diagnosing the issue.

alexperry-shifu commented 2 years ago

Did anyone ever find more details to provide? I followed the instructions here https://airflow.apache.org/docs/helm-chart/stable/index.html#installing-the-chart and I am having the same issue. My details are the same as what has already been shared. I would add that I am using the "--kubelet-insecure-tls" flag in Kubernetes if that helps. (?)

potiuk commented 2 years ago

Exactly the same issue

I think you missed the note above @Alejandro13Rob @alexperry-shifu . So let me just repeat it again.

Please open a detailed issue about this with more details (logs and describing what you experience). It might likely be a different issue. The comment "I have the same issue" on a closed issue does not help in any meaningful way in diagnosing the issue.

Honestly I am not sure what is the goal of commenting "I have the same issue" without providing this extra information.

This is just well, completely useless. If your goal is to get some help, you must provide more information. I am not sure if you have other goals, but certainly it does not help anyone (including your case), It's quite mind-boggling why would anyone do it (especially if there was an earlier comment asking to do it differently - in the way that it can be helpful for both - maintainers and yourself, rather than in the way that has 0 chances of helping anyone.

alexperry-shifu commented 2 years ago

LOL,

I have never gotten a polite (or useful) response from this project. I guess it's part of your charm. 😁

I simply used bash scripts and that worked like a charm (though obviously less than ideal for sustainment and scaling). It is really disappointing that following the instructions did not work. I am willing to accept user error as the cause, but at least I know I am not alone.

Regards,

Alex


"The Only Way To Get The Best Of An Argument Is To Avoid It. An argument is 90% emotion and 10% nonsense. A mature professional avoids arguments." -- Dale Carnegie "How to Win Friends and Influence People"

On Tue, Apr 19, 2022 at 2:06 AM Jarek Potiuk @.***> wrote:

Exactly the same issue

I think you missed the note above @Alejandro13Rob https://github.com/Alejandro13Rob @alexperry-shifu https://github.com/alexperry-shifu . So let me just repeat it again.

Please open a detailed issue about this with more details (logs and describing what you experience). It might likely be a different issue. The comment "I have the same issue" on a closed issue does not help in any meaningful way in diagnosing the issue.

Honestly I am not sure what is the goal of commenting "I have the same issue" without providing this extra information.

This is just well, completely useless. If your goal is to get some help, you must provide more information. I am not sure if you have other goals, but certainly it does not help anyone (including your case), It's quite mind-boggling why would anyone do it (especially if there was an earlier comment asking to do it differently - in the way that it can be helpful for both - maintainers and yourself, rather than in the way that has 0 chances of helping anyone.

— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/16176#issuecomment-1102127786, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATW2DPZYRSZHI3HD2NRGN7DVFZEMZANCNFSM45ZZL77A . You are receiving this because you were mentioned.Message ID: @.***>

potiuk commented 2 years ago

LOL, I have never gotten a polite (or useful) response from this project. I guess it's part of your charm. 😁

I really love when people make such precise statement. I looked for your questions and well, I could not find any. There has never been a polite nor useful answer because you ... never asked any question. Unless of course my search is wrong and you will point out to those impolite and nonuseful answers to questions I could not find.

I think if you want polite answers it's good to be factual and read the discussion before you ask a question when you were explicity asked in a previous post to not to do it and ask your question differently - in the way that might be of help to both yourself and those who try to provide help to people in their free time for the software they paid 0 USD for. Asking questions like that when you are asked not to do so is well, impolite at the very least.

alexperry-shifu commented 2 years ago

LOL,

Politeness is not necessary. We just want useful answers. This thread is a testament to the lack of the latter by far more people than just myself. So, feel free to blame me as the user/customer.

Best of luck to you and this project.

Regards,

Alex


"The Only Way To Get The Best Of An Argument Is To Avoid It. An argument is 90% emotion and 10% nonsense. A mature professional avoids arguments." -- Dale Carnegie "How to Win Friends and Influence People"

On Mon, Apr 25, 2022 at 11:23 AM Jarek Potiuk @.***> wrote:

LOL, I have never gotten a polite (or useful) response from this project. I guess it's part of your charm. 😁

I really love when people make such precise statement. I looked for your questions and well, I could not find any. There has never been a polite nor useful answer because you ... never asked any question. Unless of course my search is wrong and you will point out to those impolite and nonuseful answers to questions I could not find.

I think if you want polite answers it's good to be factual and read the discussion before you ask a question when you were explicity asked in a previous post to not to do it and ask your question differently - in the way that might be of help to both yourself and those who try to provide help to people in their free time for the software they paid 0 USD for. Asking questions like that when you are asked not to do so is well, impolite at the very least.

— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/16176#issuecomment-1108718224, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATW2DP3VB45JVPA7VFJ7N7DVG22H3ANCNFSM45ZZL77A . You are receiving this because you were mentioned.Message ID: @.***>

potiuk commented 2 years ago

In order to help you (for free in our free time) useful answers, we need You to provide useful input. We did not get it for quite some time in this thread despite asking for it several times.

Thank you for your wishes. We have more than 2000 contributors for the projects and increasing by day - often those are the people who actually did collaboarate when asked and they were so happy with the answers that they actually started contributing and helping others. I don't know. Maybe those 2000 people are wrong.

potiuk commented 2 years ago

So yes. The project is doing quite fine and we have plenty of happy and collaborating users.

javad87 commented 2 years ago

I guess this issue has not resolved yet; since, I get the same Error, when excuting this command: helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug --timeout 10m0s

client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 upgrade.go:430: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: timed out waiting for the condition Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition helm.go:88: [debug] post-upgrade hooks failed: timed out waiting for the condition UPGRADE FAILED


I deployed kubernetes cluster via microk8s and my version is: [root@localhost ~]# kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.0-2+59bbb3530b6769", GitCommit:"59bbb3530b6769e4935a05ac0e13c9910c79253e", GitTreeState:"clean", BuildDate:"2022-05-13T06:41:13Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}


Deploying the latest version of helm chart which is right now:

Airflow Helm Chart 1.6.0 (2022-05-20)

Improvements Ensure the messages from migration job show up early (#23479) Allow migration jobs and init containers to be optional (#22195)


my deployment status:

As you can see job.batch is not completed and airflow-run-airflow-migrations-nl6pm pod is in running states, don't know why it is not completing...

[root@localhost ~]# kubectl get all -n airflow

NAME READY STATUS RESTARTS AGE pod/airflow-redis-0 0/1 Pending 0 116m pod/airflow-worker-0 0/2 Pending 0 116m pod/airflow-postgresql-0 0/1 Pending 0 116m pod/airflow-triggerer-5ddcdcdfd9-kddxd 0/1 Init:0/1 0 116m pod/airflow-scheduler-75b785f69c-xw484 0/2 Init:0/1 0 116m pod/airflow-statsd-5bcb9dd76-6dx92 1/1 Running 0 116m pod/airflow-webserver-7f7c588f8c-gxszs 0/1 Init:0/1 0 96m pod/airflow-webserver-556bc74d4f-c4pws 0/1 Init:0/1 0 24m pod/airflow-run-airflow-migrations-nl6pm 1/1 Running 0 24m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/airflow-worker ClusterIP None 8793/TCP 116m service/airflow-postgresql-headless ClusterIP None 5432/TCP 116m service/airflow-statsd ClusterIP 10.152.183.56 9125/UDP,9102/TCP 116m service/airflow-webserver ClusterIP 10.152.183.232 8080/TCP 116m service/airflow-redis ClusterIP 10.152.183.65 6379/TCP 116m service/airflow-postgresql ClusterIP 10.152.183.224 5432/TCP 116m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/airflow-scheduler 0/1 1 0 116m deployment.apps/airflow-triggerer 0/1 1 0 116m deployment.apps/airflow-statsd 1/1 1 1 116m deployment.apps/airflow-webserver 0/1 1 0 116m

NAME DESIRED CURRENT READY AGE replicaset.apps/airflow-scheduler-75b785f69c 1 1 0 116m replicaset.apps/airflow-triggerer-5ddcdcdfd9 1 1 0 116m replicaset.apps/airflow-webserver-7f7c588f8c 1 1 0 96m replicaset.apps/airflow-statsd-5bcb9dd76 1 1 1 116m replicaset.apps/airflow-webserver-85544dd454 0 0 0 116m replicaset.apps/airflow-webserver-556bc74d4f 1 1 0 24m

NAME READY AGE statefulset.apps/airflow-redis 0/1 116m statefulset.apps/airflow-postgresql 0/1 116m statefulset.apps/airflow-worker 0/1 116m

NAME COMPLETIONS DURATION AGE job.batch/airflow-run-airflow-migrations 0/1 24m 24m


Anyone can help me what to do next?

tnx in advance, I know it's free and I appreciate your time feel free to contact me for more info and testing :-)

alexperry-shifu commented 2 years ago

So yes. The project is doing quite fine and we have plenty of happy and collaborating users. HAHAHAHAHA!!!

Dude, it wasn't me this time!

While I am enjoying the humor, feel free to take me off this thread.

Regards,

Alex


"The Only Way To Get The Best Of An Argument Is To Avoid It. An argument is 90% emotion and 10% nonsense. A mature professional avoids arguments." -- Dale Carnegie "How to Win Friends and Influence People"

On Wed, May 25, 2022 at 12:04 PM jsal87 @.***> wrote:

I guess this issue has not resolved yet; since, I get the same Error, when excuting this command: helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug --timeout 10m0s

client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 upgrade.go:430: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: timed out waiting for the condition Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition helm.go:88: [debug] post-upgrade hooks failed: timed out waiting for the condition UPGRADE FAILED

I deployed kubernetes cluster via microk8s and my version is: @.*** ~]# kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.0-2+59bbb3530b6769", GitCommit:"59bbb3530b6769e4935a05ac0e13c9910c79253e", GitTreeState:"clean", BuildDate:"2022-05-13T06:41:13Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}

Deploying the latest version of helm chart which is right now:

Airflow Helm Chart 1.6.0 (2022-05-20) https://airflow.apache.org/docs/helm-chart/stable/release_notes.html#id30

Improvements Ensure the messages from migration job show up early (#23479 https://github.com/apache/airflow/pull/23479) Allow migration jobs and init containers to be optional (#22195 https://github.com/apache/airflow/pull/22195)

my deployment status:

As you can see job.batch is not completed and airflow-run-airflow-migrations-nl6pm pod is in running states, don't know why it is not completing...

@.*** ~]# kubectl get all -n airflow

NAME READY STATUS RESTARTS AGE pod/airflow-redis-0 0/1 Pending 0 116m pod/airflow-worker-0 0/2 Pending 0 116m pod/airflow-postgresql-0 0/1 Pending 0 116m pod/airflow-triggerer-5ddcdcdfd9-kddxd 0/1 Init:0/1 0 116m pod/airflow-scheduler-75b785f69c-xw484 0/2 Init:0/1 0 116m pod/airflow-statsd-5bcb9dd76-6dx92 1/1 Running 0 116m pod/airflow-webserver-7f7c588f8c-gxszs 0/1 Init:0/1 0 96m pod/airflow-webserver-556bc74d4f-c4pws 0/1 Init:0/1 0 24m pod/airflow-run-airflow-migrations-nl6pm 1/1 Running 0 24m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/airflow-worker ClusterIP None 8793/TCP 116m service/airflow-postgresql-headless ClusterIP None 5432/TCP 116m service/airflow-statsd ClusterIP 10.152.183.56 9125/UDP,9102/TCP 116m service/airflow-webserver ClusterIP 10.152.183.232 8080/TCP 116m service/airflow-redis ClusterIP 10.152.183.65 6379/TCP 116m service/airflow-postgresql ClusterIP 10.152.183.224 5432/TCP 116m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/airflow-scheduler 0/1 1 0 116m deployment.apps/airflow-triggerer 0/1 1 0 116m deployment.apps/airflow-statsd 1/1 1 1 116m deployment.apps/airflow-webserver 0/1 1 0 116m

NAME DESIRED CURRENT READY AGE replicaset.apps/airflow-scheduler-75b785f69c 1 1 0 116m replicaset.apps/airflow-triggerer-5ddcdcdfd9 1 1 0 116m replicaset.apps/airflow-webserver-7f7c588f8c 1 1 0 96m replicaset.apps/airflow-statsd-5bcb9dd76 1 1 1 116m replicaset.apps/airflow-webserver-85544dd454 0 0 0 116m replicaset.apps/airflow-webserver-556bc74d4f 1 1 0 24m

NAME READY AGE statefulset.apps/airflow-redis 0/1 116m statefulset.apps/airflow-postgresql 0/1 116m statefulset.apps/airflow-worker 0/1 116m

NAME COMPLETIONS DURATION AGE job.batch/airflow-run-airflow-migrations 0/1 24m 24m

Anyone can help me what to do next?

tnx in advance, I know it's free and I appreciate your time feel free to contact me for more info and testing :-)

— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/16176#issuecomment-1137478848, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATW2DP7EIPP5GCH6ESWIYU3VLZFR5ANCNFSM45ZZL77A . You are receiving this because you were mentioned.Message ID: @.***>

javad87 commented 2 years ago

I guess this issue has not resolved yet; since, I get the same Error, when excuting this command: helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug --timeout 10m0s

client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 upgrade.go:430: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: timed out waiting for the condition Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition helm.go:88: [debug] post-upgrade hooks failed: timed out waiting for the condition UPGRADE FAILED

I deployed kubernetes cluster via microk8s and my version is: [root@localhost ~]# kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.0-2+59bbb3530b6769", GitCommit:"59bbb3530b6769e4935a05ac0e13c9910c79253e", GitTreeState:"clean", BuildDate:"2022-05-13T06:41:13Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}

Deploying the latest version of helm chart which is right now:

Airflow Helm Chart 1.6.0 (2022-05-20)

Improvements Ensure the messages from migration job show up early (#23479) Allow migration jobs and init containers to be optional (#22195)

my deployment status:

As you can see job.batch is not completed and airflow-run-airflow-migrations-nl6pm pod is in running states, don't know why it is not completing...

[root@localhost ~]# kubectl get all -n airflow

NAME READY STATUS RESTARTS AGE pod/airflow-redis-0 0/1 Pending 0 116m pod/airflow-worker-0 0/2 Pending 0 116m pod/airflow-postgresql-0 0/1 Pending 0 116m pod/airflow-triggerer-5ddcdcdfd9-kddxd 0/1 Init:0/1 0 116m pod/airflow-scheduler-75b785f69c-xw484 0/2 Init:0/1 0 116m pod/airflow-statsd-5bcb9dd76-6dx92 1/1 Running 0 116m pod/airflow-webserver-7f7c588f8c-gxszs 0/1 Init:0/1 0 96m pod/airflow-webserver-556bc74d4f-c4pws 0/1 Init:0/1 0 24m pod/airflow-run-airflow-migrations-nl6pm 1/1 Running 0 24m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/airflow-worker ClusterIP None 8793/TCP 116m service/airflow-postgresql-headless ClusterIP None 5432/TCP 116m service/airflow-statsd ClusterIP 10.152.183.56 9125/UDP,9102/TCP 116m service/airflow-webserver ClusterIP 10.152.183.232 8080/TCP 116m service/airflow-redis ClusterIP 10.152.183.65 6379/TCP 116m service/airflow-postgresql ClusterIP 10.152.183.224 5432/TCP 116m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/airflow-scheduler 0/1 1 0 116m deployment.apps/airflow-triggerer 0/1 1 0 116m deployment.apps/airflow-statsd 1/1 1 1 116m deployment.apps/airflow-webserver 0/1 1 0 116m

NAME DESIRED CURRENT READY AGE replicaset.apps/airflow-scheduler-75b785f69c 1 1 0 116m replicaset.apps/airflow-triggerer-5ddcdcdfd9 1 1 0 116m replicaset.apps/airflow-webserver-7f7c588f8c 1 1 0 96m replicaset.apps/airflow-statsd-5bcb9dd76 1 1 1 116m replicaset.apps/airflow-webserver-85544dd454 0 0 0 116m replicaset.apps/airflow-webserver-556bc74d4f 1 1 0 24m

NAME READY AGE statefulset.apps/airflow-redis 0/1 116m statefulset.apps/airflow-postgresql 0/1 116m statefulset.apps/airflow-worker 0/1 116m

NAME COMPLETIONS DURATION AGE job.batch/airflow-run-airflow-migrations 0/1 24m 24m

Anyone can help me what to do next?

tnx in advance, I know it's free and I appreciate your time feel free to contact me for more info and testing :-)

Even when I want to delete helm chart with this command: helm delete airflow --namespace airflow

All replicaset, deployment, pods will be deleted but still migration pod and job.batch is running!!!

[root@localhost ~]# kubectl get all -n airflow NAME READY STATUS RESTARTS AGE pod/airflow-run-airflow-migrations-djx7d 1/1 Running 3 (16m ago) 3h12m

NAME COMPLETIONS DURATION AGE job.batch/airflow-run-airflow-migrations 0/1 3h12m 3h12m

I decided to delete them manually:

kubectl delete pod airflow-run-airflow-migrations-djx7d -n airflow

kubectl delete job.batch airflow-run-airflow-migrations -n airflow


BTW, before installing airflow with kubernetes I did install it via docker and it worked perfectly, but since is not recomended for production I decided to move to K8s, and used following command for clean up docker containers and volume which was created for postgres:

docker-compose down --volume docker rm rm -f

there is now no trace of installation by docker, any way kubernetes uses containerd and creates it's own namesapces so it should not be any conflict with docker...

after doing all of this still getting same error!

@alexperry-shifu tnx for following and asking contributors to solve this issue, but with due all respect consider that all of us are busy in this world specially during Covid era and with this culture of go, go, go mentality, for someone like me who is living in Iran; life is much more difficult living under despotic brutal corrupt Islamic regime and US and western countries unprecedented sanctions which only affects people like me, I do not want to speak about politics and this place is not for this purpose but just want to remind you that we have to be patient and considering all are busy with their own life. Besides that open-source project means we all have access to code and if we want and have time we can work on it. I hope contributors to this project solve it ASAP. Wish living with peace and prosperity for all; Best Regards;

potiuk commented 2 years ago

What the logs of your migration pods show ? https://www.digitalocean.com/community/questions/how-to-check-the-logs-of-running-and-crashed-pods-in-kubernetes

javad87 commented 2 years ago

What the logs of your migration pods show ? https://www.digitalocean.com/community/questions/how-to-check-the-logs-of-running-and-crashed-pods-in-kubernetes

I run this command: kubectl logs airflow-run-airflow-migrations-h7b72 -n airflow it shows nothing even with -f to follow log it still shows nothing and in other terminal helm install command is running with this --debug message showing in console:

[root@localhost ~]# helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug --timeout 10m0s WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config history.go:56: [debug] getting history for release airflow Release "airflow" does not exist. Installing it now. install.go:178: [debug] Original chart version: "" install.go:199: [debug] CHART PATH: /root/.cache/helm/repository/airflow-1.6.0.tgz

client.go:128: [debug] creating 1 resource(s) client.go:299: [debug] Starting delete for "airflow-broker-url" Secret client.go:128: [debug] creating 1 resource(s) client.go:299: [debug] Starting delete for "airflow-fernet-key" Secret client.go:128: [debug] creating 1 resource(s) client.go:299: [debug] Starting delete for "airflow-redis-password" Secret client.go:128: [debug] creating 1 resource(s) client.go:128: [debug] creating 30 resource(s) client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job client.go:128: [debug] creating 1 resource(s) client.go:528: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 10m0s client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0 client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 client.go:556: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED client.go:595: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

potiuk commented 2 years ago

can you please use k9s tool and connect/monitor/extract the migration logs ? I found out that it is much better in getting to the right logs.

potiuk commented 2 years ago

K9s will allow you to monitor more logs in your deployment and likely find the right problem - just explore your installation with it.

javad87 commented 2 years ago

i

File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 142, in _do_get │ │ return self._create_connection() │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 247, in _create_connection │ │ return _ConnectionRecord(self) │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 362, in init │ │ self.connect(first_connect_check=True) │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 605, in connect │ │ pool.logger.debug("Error on connect(): %s", e) │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 72, in exit │ │ with_traceback=exctb, │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise │ │ raise exception │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 599, in connect │ │ connection = pool._invoke_creator(self) │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/create.py", line 578, in connect │ │ return dialect.connect(*cargs, *cparams) │ │ File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 583, in connect │ │ return self.dbapi.connect(cargs, **cparams) │ │ File "/home/airflow/.local/lib/python3.7/site-packages/psycopg2/init__.py", line 122, in connect │ │ conn = _connect(dsn, connection_factory=connection_factory, **kwasync) │ │ sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "airflow-postgresql.airflow" (10.152.183.179), po │ │ Is the server running on that host and accepting TCP/IP connections? │ │ │ │ (Background on this error at: http://sqlalche.me/e/14/e3q8) │

potiuk commented 2 years ago

So you have a problem with connecting to postgres then

javad87 commented 2 years ago

So you have a problem with connecting to postgres then

how can I resolve connection issue?

potiuk commented 2 years ago

No idea. You have to debug it.

Abhinav1598 commented 2 years ago

install.go:173: [debug] Original chart version: "" install.go:190: [debug] CHART PATH: /home/e4338/.cache/helm/repository/airflow-1.6.0.tgz

client.go:290: [debug] Starting delete for "airflow-broker-url" Secret client.go:128: [debug] creating 1 resource(s) client.go:290: [debug] Starting delete for "airflow-fernet-key" Secret client.go:128: [debug] creating 1 resource(s) client.go:290: [debug] Starting delete for "airflow-redis-password" Secret client.go:128: [debug] creating 1 resource(s) client.go:128: [debug] creating 30 resource(s) client.go:290: [debug] Starting delete for "airflow-run-airflow-migrations" Job client.go:128: [debug] creating 1 resource(s) client.go:519: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 20m0s client.go:547: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED client.go:586: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

Abhinav1598 commented 2 years ago

I am facing the exact same error, as per official documentation the postgres db is itself being created in a container, so connection issue should not be there, it just gets stuck at airflow-run-airflow-migrations.

Any resolution will be highly appreciated.

potiuk commented 2 years ago

I am facing the exact same error, as per official documentation the postgres db is itself being created in a container, so connection issue should not be there, it just gets stuck at airflow-run-airflow-migrations.

Any resolution will be highly appreciated.

More details as mentioned, are the only way any help can be given to you (or rather yourself looking at the logs of migration job will likely find the reason). Without those details we are not able to help you.

Stating " I have the same problem" without providing any additional details helps no-one to find the root cause. If you state "I have the same problem" you need to provide more detailed logs to bring any value to the discussion here @Abhinav1598

lordvcs commented 2 years ago

K9s will allow you to monitor more logs in your deployment and likely find the right problem - just explore your installation with it.

Tried using K9s still dont see any log output, most of the time it just says stream logs failed container ... for each pod/container that I check. Any other ideas to debug

potiuk commented 2 years ago

kubectl ? How else are you debugging other charts? Just do the same.

Abhinav1598 commented 2 years ago

It’s solved, I was inside my companies vpn, so I was unable to pull the images from docker. I pulled and pushed the images to my remote repo, and it started working. :)