AWS / EKS – “cannot use sqlite with the LocalExecutor" Error when POD’s starts

macwro commented 3 years ago

What is the bug? On newest version 8.0.2 of airflow helm chart, running on top of AWS EKS cluster with local Postrges DB and S3 bucket as logs locations, I get error “cannot use sqlite with the LocalExecutor” when pods starts to execute DAG task. POD has status : “Error”. Log from failed POD:

Traceback (most recent call last):  
File "/home/airflow/.local/bin/airflow", line 5, in <module>
from airflow.__main__ import main  
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__init__.py", line 34,   in <module>    
from airflow import settings  
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/settings.py", line 37, in <module>    
from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401  
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py", line 1007, in <module>    conf.validate()  
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py", line 209, in validate    self._validate_config_dependencies()  
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py", line 239, in _validate_config_dependencies    
raise AirflowConfigException(f"error: cannot use sqlite with the {self.get('core', 'executor')}")
airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the LocalExecutor

Problem exist for custom and examples DAGs. I see that “Local Executor” is set by: charts/airflow/files/pod_template.kubernetes-helm-yaml I made a test and updated “airflow2-pod-template” ConfigMap with “KubernetesExecutor” value for “AIRFLOWCOREEXECUTOR”. Unfortunately result was negative. Error still exist but reports for “cannot use sqlite with the KubernetesExecutor”.

I also had to implemented workaround from thread: https://github.com/airflow-helm/charts/issues/119 to allow PODs to start.

What are your Helm values?

airflow:
  legacyCommands: false
  image:
    repository: apache/airflow
    tag: 2.0.1-python3.8
    ## values: Always or IfNotPresent
    pullPolicy: IfNotPresent
    pullSecret: ""
    uid: 50000
    gid: 50000

  executor: KubernetesExecutor
  fernetKey: "<key>"

  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: "apache/airflow"
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "2.0.1-python3.8"
    AIRFLOW__KUBERNETES__NAMESPACE: "<namespace-name>"
    AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "False"
    AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE: "False"
    AIRFLOW__CORE__LOAD_EXAMPLES : "False"
    AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS: "False"
    AIRFLOW__LOGGING__REMOTE_LOGGING : "True"
    AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID : "s3_conn"
    AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER : "s3://<s3-bucket>/logs-airflow2"
    AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "60"
    AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH: "repo/"
    AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
    AIRFLOW__KUBERNETES__DAGS_IN_IMAGE: "False"

  usersUpdate: true
  connections: []
  connectionsUpdate: true
  variables: []
  variablesUpdate: false
  pools: []
  poolsUpdate: false
  podAnnotations: {}
  extraPipPackages: []
  extraEnv: []
  extraContainers: []
  extraVolumeMounts: []
  extraVolumes: []

  kubernetesPodTemplate:
    stringOverride: ""
    nodeSelector: {}
    affinity: {}
    tolerations: []
    podAnnotations: {}
    securityContext: {}
    extraPipPackages: []
    extraVolumeMounts: []
    extraVolumes: []

scheduler:
  replicas: 1
  resources: {}
  nodeSelector: {}
  affinity: {}
  tolerations: []
  securityContext: {}
  labels: {}
  podLabels: {}
  annotations: {}
  podAnnotations: {}
  safeToEvict: true
  podDisruptionBudget:
    enabled: false
    maxUnavailable: ""
    minAvailable: ""
  numRuns: -1
  livenessProbe:
    enabled: true
    initialDelaySeconds: 10
    periodSeconds: 30
    timeoutSeconds: 10
    failureThreshold: 5
  extraPipPackages: []
  extraVolumeMounts: []
  extraVolumes: []
  extraInitContainers: []

web:
  webserverConfig:
    stringOverride: 
      AUTH_ROLE_PUBLIC = "Admin"
    existingSecret: ""
  replicas: 1
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: '1'
      memory: 2Gi
  nodeSelector: {}
  affinity: {}
  tolerations: []
  securityContext: {}
  labels: {}
  podLabels: {}
  annotations: {}
  podAnnotations: {}
  safeToEvict: true
  podDisruptionBudget:
    enabled: false
    maxUnavailable: ""
    minAvailable: ""
  service:
    annotations: {}
    sessionAffinity: "None"
    sessionAffinityConfig: {}
    type: NodePort
    externalPort: 8080
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    nodePort:
      http: ""
  readinessProbe:
    enabled: false
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
 livenessProbe:
    enabled: true
    initialDelaySeconds: 300
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
 extraPipPackages: []
 extraVolumeMounts: []
 extraVolumes: []

workers:
  enabled: false

flower:
  enabled: false

logs:
  path: /opt/airflow/logs
  persistence:
    enabled: false
    existingClaim: ""
    subPath: ""
    storageClass: ""
    accessMode: ReadWriteMany
    size: 1Gi

dags:
  path: /opt/airflow/dags
  persistence:
    enabled: false

  gitSync:
    enabled: true
    image:
      repository: k8s.gcr.io/git-sync/git-sync
      tag: v3.2.2
      ## values: Always or IfNotPresent
      pullPolicy: IfNotPresent
      uid: 65533
      gid: 65533
    resources: {}
    repo: "https://<user>:<token>@<gitlab-repo-url>/airflow2.git"
    repoSubPath: "dags"
    branch: "master"
    revision: "HEAD"
    depth: 1
    syncWait: 60
    syncTimeout: 120
    httpSecret: ""
    httpSecretUsernameKey: username
    httpSecretPasswordKey: password
    sshSecret: ""
    sshSecretKey: id_rsa
    sshKnownHosts: ""

ingress:
  enabled: true
  web:
    annotations: {
      kubernetes.io/ingress.class: alb,
      alb.ingress.kubernetes.io/group.name: <group-name>,
      alb.ingress.kubernetes.io/auth-type: oidc,
      alb.ingress.kubernetes.io/auth-idp-oidc: '<oidc-settings>', 
      alb.ingress.kubernetes.io/certificate-arn: '<cert-arn>',
      alb.ingress.kubernetes.io/scheme: internet-facing,
      alb.ingress.kubernetes.io/target-type: ip,
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80 },{"HTTPS": 443}]',
      alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}',
      alb.ingress.kubernetes.io/wafv2-acl-arn: '<waf-rn>',
      alb.ingress.kubernetes.io/healthcheck-path: '/health'
    }
    labels: {}
    path: ""
    host: "<airflow2-fqdn>"
    tls:
      enabled: false
      secretName: ""
    precedingPaths: []
    succeedingPaths: []
  flower:
    annotations: {}
    labels: {}
    path: ""
    host: ""
    tls:
      enabled: false
      secretName: ""
    precedingPaths: []
    succeedingPaths: []

rbac:
  create: true
  events: true

serviceAccount:
  create: false
  name: "<k8s-namespace-sa>"
  annotations:
    eks.amazonaws.com/role-arn: "<role-arn>"

extraManifests: []

postgresql:
  enabled: true
  postgresqlDatabase: airflow2
  postgresqlUsername: postgres
  postgresqlPassword: airflow2
  existingSecret: ""
  existingSecretKey: "postgresql-password"
  persistence:
    enabled: true
    storageClass: "efs-sc"
    accessModes:
      - ReadWriteOnce
    size: 8Gi
  master:
    podAnnotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

externalDatabase:
  type: postgres
  host: localhost
  port: 5432
  database: airflow
  user: airflow
  passwordSecret: ""
  passwordSecretKey: "postgresql-password"
  properties: ""

redis:
  enabled: false

externalRedis:
  host: localhost
  port: 6379
  databaseNumber: 1
  passwordSecret: ""
  passwordSecretKey: "redis-password"

serviceMonitor:
  enabled: false

prometheusRule:
  enabled: false

What is your Kubernetes Version?:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-eks-7684af", GitCommit:"7684af4ac41370dd109ac13817023cb8063e3d45", GitTreeState:"clean", BuildDate:"2020-10-20T22:57:40Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

What is your Helm version?:

$ helm version
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.1

karakanb commented 3 years ago

I suspect this has something to do with the workaround with the empty config. As of 8.0.2, the configuration is still needed and the secret is not used. A workaround that fixes both of these issues is to decode the secret and create a ConfigMap with that values, with the name <release-name>-env.

You can use the following for decoding the secret:

kubectl get secret name-of-secret -o go-template='
{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}'

Creating a configmap with these contents seems to fix the issue. If https://github.com/airflow-helm/charts/pull/122 is merged, it should also fix the issue I believe.

thesuperzapper commented 3 years ago

@macwro can you confirm if this issue is fixed after version 8.0.3 of the chart?

rolanddb commented 3 years ago

I encountered this issue with 8.0.5 of the chart. The default PodTemplate refers to a configmap airflow-env. This doesn't exist, so the default settings of the Chart are broken. Creating an empty airflow-env configmap won't work, because the worker pods will require the same database config as the scheduler/web pods to communicate with the backend db. So you should refer to the secret airflow-config in the same way as the scheduler/web deployments.

However, that leads me to a BACKEND: unbound variable error in the entrypoint script.

macwro commented 3 years ago

@macwro can you confirm if this issue is fixed after version 8.0.3 of the chart?

I can confirm that on version 8.0.5 that problem does not exist.

Thanks for fixing it !

thesuperzapper commented 3 years ago

@rolanddb can you please check again, as 8.0.5 seems to correctly reference airflow-config https://github.com/airflow-helm/charts/blob/airflow-8.0.5/charts/airflow/files/pod_template.kubernetes-helm-yaml#L44-L46

rolanddb commented 3 years ago

@thesuperzapper I double checked, but I'm still getting that BACKEND: unbound variable error in the entrypoint script (using the Dockerfile from the apache/airflow repo). The script triest to be a little too smart for my taste, e.g. it uses a regex to parse the SQLAlchemy string to get the host/port and do connectivity tests on that. I actually tried the regex and it does capture the various elements but somewhere beyond that, it fails. I've modified the entrypoint to exclude some of the checks. Airflow is up and running now on my cluster. Thanks for the help.

thesuperzapper commented 3 years ago

@rolanddb we override the entrypoint for all the pods.

What are you doing that is running the airflow/airflow Dockerfile entrypoint?

rolanddb commented 3 years ago

@thesuperzapper Are we looking at the same thing? I don't see anything specifying the entrypoint (I checked scheduler, e.g. https://github.com/airflow-helm/charts/blob/main/charts/airflow/templates/scheduler/scheduler-deployment.yaml#L98-L104 and the pod template). All that is being specified is the command and args.

thesuperzapper commented 3 years ago

@rolanddb command overrides the Dockefile ENTRYPOINT, see the Kuberntes API docs: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#container-v1-core

rolanddb commented 3 years ago

@thesuperzapper Thanks for pointing that out! I wasn't aware that Kubernetes uses the same terminology (command/entrypoint) for overlapping functionality but in a different way. Super confusing but now I am aware of it.

Back to the issue: you are right that the entrypoint is overridden for the scheduler/web deployments. But as far as I can tell, the default pod template does not override the command. See https://github.com/airflow-helm/charts/blob/main/charts/airflow/files/pod_template.kubernetes-helm-yaml#L56-L57

So my example (scheduler deployment that I linked above) was wrong but I think that the issue still stands. I have added an echo statement in the entrypoint script and that is being printed when a worker pod is running. So if the container image contains an entrypoint that depends on some ENV vars that are not available, things will not work. I think this is the case for the default settings of this Helm charts right now.

thesuperzapper commented 3 years ago

@rolanddb can you clarify if 8.0.6 still has whatever issue you where raising in this issue?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

airflow-helm / charts

AWS / EKS – “cannot use sqlite with the LocalExecutor" Error when POD’s starts #126