support multiple git-sync repos (for dags & plugins)

thesuperzapper commented 3 years ago

Currently we only support a single git repo with dags.gitSync, we should consider extending this to allow for multiple dags git repos, if this can be done with minimal impact to the values structure.

yehlo commented 3 years ago

I thought about this a bit and would propose something along the lines of:

dags.gitSync.repolist: 
- repo: ""
  repoSubPath: ""
  branch: ""
  revision: ""
  depth: 1
  ...

Currently I am considering adding default options like secretKey, branch etc. so that you don't have to define them in each repo entry but would still be able to overwrite it foreach repo.

Sadly the git-sync sidecar container is specifically designed to clone only one repo. It was kinda discussed in here: https://github.com/kubernetes/git-sync/issues/261

We could always just range over the repolist and add an additional sidecar foreach repo. Altough this would probably lead to a lot of sidecars in bigger environments. Would you rather have multiple sidecars or add a custom git-sync mechanism?

thesuperzapper commented 3 years ago

I've been thinking about an alternative, which is some kind of python library which would be used as the only main.py file in your gitSync repo (which itself is configured by some kind of config.yaml in the gitSync repo).

The purpose of this library would be to target specific tags/revisions of other repos, to allow a more git-ops approach to airflow in general, specifically enabling one git-repo per dag. (While letting you pin version tags from these repos from the config.yaml, so you can change-manage your dags)

The library would handle being called at each AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL, and sync any updates from the remote repos if their targeted tags/branches have new/changed commits.

NOTE: it might not even need to be a library, and might be fine as a basic script

yehlo commented 3 years ago

So you would basically still have only one gitsync repo defined in k8s but the given repo would add the possibility of a config.yaml or wathever else. This config.yaml then includes other repos with their tag/rev.

I like the idea of managing this outside of the helm chart that I don't have to update my k8s ressources for a new dag / project repo. So hard agree on that one.

I think then it would be needed to use something like the old setup with a basic alpine linux and script mounted by cm. I kind of dislike the idea of a main.py within my gitsync repo since it introduces additional complexity in handling this mechanism. Adding just the possibility of a config.yaml with a given list of repos should do the work. (Altough this means that the script that handles the config.yaml will be more complex) Obviously it should still be possible to just use the git-sync repo as an "all-in-one" dag bucket.

The config.yaml would then be picked up by a basic script and include the "sub-repos" within airflow_home/dags/.

Giving the user the ability to define a main.py sounds more like a 3rd option of this mechanism. In that way a "power-user" can do whatever he pleases but less experienced users can default to the "all-in-one" repo or to a config.yaml with a given structure.

will-m-buchanan commented 3 years ago

Just curious, if you were to go with @yehlo's original repoList proposal with a range of sidecars over repos in the list, how would you configure airflow to read the dags from each repoSubPath? DagBags?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thesuperzapper commented 3 years ago

Bumping for the bot, I will work on this after the next minor release (which should add PGBouncer).

thesuperzapper commented 3 years ago

~For those watching, it is already possible to use multiple git repos. Git has a feature called .gitmodules which allows you to "recursively" reference git repositories from a single repo.~

~There is an issue to document this here: https://github.com/airflow-helm/charts/issues/434~

~Given this, I am not sure that we should support multiple git repos directly in the chart, as the .gitmodules approach is better for a few reasons:~ ~1. It allows you to define (in git) which versions of other repos are synced, meaning you don't have to update your helm values to change dag versions.~ ~2. It reduces the overhead by not requiring a sidecar container per repo (per deployment) ~

EDIT: as discussed in https://github.com/airflow-helm/charts/issues/434#issuecomment-938216538, there are significant limitations to the .gitmodules approach, so we should aim to support multiple sidecars if the user wants.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thesuperzapper commented 2 years ago

Reopening

thesuperzapper commented 2 years ago

Just curious, if you were to go with @yehlo's original repoList proposal with a range of sidecars over repos in the list, how would you configure airflow to read the dags from each repoSubPath? DagBags?

@gillbuchanan The trick for getting multiple git-repos to work, is to use symbolic links that are auto-created as the main airflow container starts.

There would be an emptyDir volume for each git-repo:
- populated by its own init-container
- kept updated by its own sidecar container
These emptyDir volumes would be mounted at /mnt/git-sync/my-first-repo/, /mnt/git-sync/my-second-repo/
- NOTE: the kubernetes/git-sync container creates its own symbolic links, so our symbolic links need to point to the repo symbolic link created under /mnt/git-sync/my-first-repo
As the main airflow container starts, it would create symbolic links under /opt/airflow/dags that point to /mnt/git-sync/my-first-repo/repo and /mnt/git-sync/my-second-repo/repo.

juliandm commented 2 years ago

Would it make sense to just git submodules? https://git-scm.com/book/en/v2/Git-Tools-Submodules

this way there is still one git repository that contains the DAGs and additional code from other repos can be fetched by setting GIT_SYNC_SUBMODULES to "recursive" (this is the default value anyway).

This would be a solution with minimal effort and probably according to what the developers of git-sync had in mind.

thesuperzapper commented 2 years ago

@juliandm previously, I had the idea to use submodules and talked about it in https://github.com/airflow-helm/charts/issues/166#issuecomment-929671884, but I think it has some serious limitations for this use-case. Namely that submodules are not well understood by most users, and things like authentication will become problems.

Using the method described in https://github.com/airflow-helm/charts/issues/166#issuecomment-993123687, I think the best approach is to allow a list of repos to be used in dags.gitSync, with the two remaining issues being:

Not requiring a git-sync sidecar and git-clone init-container for each repo:
- Otherwise, the number of containers will exponentially increase with the number of repos.
- Instead, we should make a single container that is responsible to sync all the repos.
- Hopefully, someone has made something like the kubernetes/git-sync container which works for multiple repos
How to arrange the dags.gitSync values:
- We can probably create a new section called dags.gitSyncMultiand use mappings for things like SSL configs.

hkvia commented 2 years ago

@thesuperzapper Is there any progress on this? Could we have a workaround to pass Env var for the git branch so we can have separate repos per deployment?

thesuperzapper commented 2 years ago

@thesuperzapper Is there any progress on this? Could we have a workaround to pass Env var for the git branch so we can have separate repos per deployment?

@hkvia what do you mean "separate repos per deployment", all Airflow workers need to have the same dags?

hkvia commented 2 years ago

@thesuperzapper ignore the last bit. How can I set branch name using Env var? Or a way to overwrite GIT_SYNC_BRANCH?

thesuperzapper commented 2 years ago

Just wanted to note that git-sync v3.4.0+ supports having GIT_SYNC_DEST not under GIT_SYNC_ROOT.

This makes the solution slightly cleaner than what was proposed in https://github.com/airflow-helm/charts/issues/166#issuecomment-993123687, as we now don't have to modify the startup script of the container to create the symbolic links (as git-sync will create them for us if we specify GIT_SYNC_DEST = /opt/airflow/dags/REPO_1/).

kstavropoulos-fl commented 2 years ago

@thesuperzapper have you accomplished what you have been describing in this thread? I am trying to make this work :

initContainers for different repos are ok, created with extra emptyDirs
- sideContainers also work and keep the repos in sync
I have to manually edit the statefulSet of the worker, though, to create the symbolic links.

For this 3rd asterisk, is there something I am missing? Can it be done in a more streamlined way? The only thing I have thought of so far is to support in the helm chart editing the command of the airflow containers - would you be open to that?

Edit: Additional findings/info: I do not think the extra initContainers are required - With just the extra git-sync side Container for each repository, it works fine for me. The permissions are a bit of a mess with all of the containers and pods, but it works. I use FluxCD for the Helm release, and I have patched the command and args of the main containers via the PostRender functionality, but it would be far nicer in my opinion to be able to parameterize these, directly via this Helm chart.

MISSEY commented 2 years ago

@thesuperzapper I created two extra containers in scheduler and triggerer for two repositories. Both mounted on /opt/airflow/dags/git1 and opt/airflow/dags/git2.

I am able to see the dags from both the repository but when I execute dags, not get executed as there are always two instances of every dag in git1 and git2. One is in git-sync work-tree and another instance in repo. For example,

drwxrwsrwx 4 root     root 4096 Sep 28 16:22 .
drwxr-xr-x 1 root     root 4096 Sep 28 16:22 ..
drwxr-sr-x 9 git-sync root 4096 Sep 28 16:22 .git
drwxr-sr-x 3 git-sync root 4096 Sep 28 16:22 1a869978271ddde92e9dd487ee7a64c22d0f430e
lrwxrwxrwx 1 git-sync root   40 Sep 28 16:22 repo -> 1a869978271ddde92e9dd487ee7a64c22d0f430e

This is the content of git-sync of 1 repository and the dags are present in work-tree (1a869978271ddde92e9dd487ee7a64c22d0f430e) and also in repo. So not able to run the dags, as it is duplicated.

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
---
# Default values for airflow.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Provide a name to substitute for the full names of resources
fullnameOverride: ""

# Provide a name to substitute for the name of the chart
nameOverride: ""

# Provide a Kubernetes version (used for API Version selection) to override the auto-detected version
kubeVersionOverride: ""

# User and group of airflow user
uid: 50000
gid: 0

# Default security context for airflow
securityContext: {}
#  runAsUser: 50000
#  fsGroup: 0
#  runAsGroup: 0

# Airflow home directory
# Used for mount paths
airflowHome: /opt/airflow

# Default airflow repository -- overrides all the specific images below
defaultAirflowRepository: apache/airflow
#git.ni.dfki.de:5050/ml_infrastructure/airflow-pbr-dags/airflow2.2.3 #apache/airflow

# Default airflow tag to deploy
defaultAirflowTag: "2.2.3" 
#"git_sync" #"2.2.3"

# Airflow version (Used to make some decisions based on Airflow Version being deployed)
airflowVersion: "2.2.3"

# Images
images:
  airflow:
    repository: ~
    tag: ~
    pullPolicy: IfNotPresent
  # To avoid images with user code, you can turn this to 'true' and
  # all the 'run-airflow-migrations' and 'wait-for-airflow-migrations' containers/jobs
  # will use the images from 'defaultAirflowRepository:defaultAirflowTag' values
  # to run and wait for DB migrations .
  useDefaultImageForMigration: false
  # timeout (in seconds) for airflow-migrations to complete
  migrationsWaitTimeout: 60
  pod_template:
    repository: ~
    tag: ~
    pullPolicy: IfNotPresent
  flower:
    repository: ~
    tag: ~
    pullPolicy: IfNotPresent
  statsd:
    repository: apache/airflow
    tag: airflow-statsd-exporter-2021.04.28-v0.17.0
    pullPolicy: IfNotPresent
  redis:
    repository: redis
    tag: 6-buster
    pullPolicy: IfNotPresent
  pgbouncer:
    repository: apache/airflow
    tag: airflow-pgbouncer-2021.04.28-1.14.0
    pullPolicy: IfNotPresent
  pgbouncerExporter:
    repository: apache/airflow
    tag: airflow-pgbouncer-exporter-2021.09.22-0.12.0
    pullPolicy: IfNotPresent
  gitSync:
    repository: k8s.gcr.io/git-sync/git-sync
    tag: v3.3.0
    pullPolicy: IfNotPresent

# Select certain nodes for airflow pods.
nodeSelector: {}
affinity: {}
tolerations: []

# Add common labels to all objects and pods defined in this chart.
labels: {}

# Ingress configuration
ingress:
  # Enable ingress resource
  enabled: false

  # Configs for the Ingress of the web Service
  web:
    # Annotations for the web Ingress
    annotations: {}

    # The path for the web Ingress
    path: "/"

    # The pathType for the above path (used only with Kubernetes v1.19 and above)
    pathType: "ImplementationSpecific"

    # The hostname for the web Ingress (Deprecated - renamed to `ingress.web.hosts`)
    host: ""

    # The hostnames or hosts configuration for the web Ingress
    hosts: []
    # - name: ""
    #   # configs for web Ingress TLS
    #   tls:
    #     # Enable TLS termination for the web Ingress
    #     enabled: false
    #     # the name of a pre-created Secret containing a TLS private key and certificate
    #     secretName: ""

    # The Ingress Class for the web Ingress (used only with Kubernetes v1.19 and above)
    ingressClassName: ""

    # configs for web Ingress TLS (Deprecated - renamed to `ingress.web.hosts[*].tls`)
    tls:
      # Enable TLS termination for the web Ingress
      enabled: false
      # the name of a pre-created Secret containing a TLS private key and certificate
      secretName: ""

    # HTTP paths to add to the web Ingress before the default path
    precedingPaths: []

    # Http paths to add to the web Ingress after the default path
    succeedingPaths: []

  # Configs for the Ingress of the flower Service
  flower:
    # Annotations for the flower Ingress
    annotations: {}

    # The path for the flower Ingress
    path: "/"

    # The pathType for the above path (used only with Kubernetes v1.19 and above)
    pathType: "ImplementationSpecific"

    # The hostname for the flower Ingress (Deprecated - renamed to `ingress.flower.hosts`)
    host: ""

    # The hostnames or hosts configuration for the flower Ingress
    hosts: []
    # - name: ""
    #   tls:
    #     # Enable TLS termination for the flower Ingress
    #     enabled: false
    #     # the name of a pre-created Secret containing a TLS private key and certificate
    #     secretName: ""

    # The Ingress Class for the flower Ingress (used only with Kubernetes v1.19 and above)
    ingressClassName: ""

    # configs for flower Ingress TLS (Deprecated - renamed to `ingress.flower.hosts[*].tls`)
    tls:
      # Enable TLS termination for the flower Ingress
      enabled: false
      # the name of a pre-created Secret containing a TLS private key and certificate
      secretName: ""

# Network policy configuration
networkPolicies:
  # Enabled network policies
  enabled: false

# Extra annotations to apply to all
# Airflow pods
airflowPodAnnotations: {}

# Extra annotations to apply to
# main Airflow configmap
airflowConfigAnnotations: {}

# `airflow_local_settings` file as a string (can be templated).
airflowLocalSettings: |-
  {{- if semverCompare ">=2.2.0" .Values.airflowVersion }}
  {{- if not (or .Values.webserverSecretKey .Values.webserverSecretKeySecretName) }}
  from airflow.www.utils import UIAlert

  DASHBOARD_UIALERTS = [
    UIAlert(
      'Usage of a dynamic webserver secret key detected. We recommend a static webserver secret key instead.'
      ' See the <a href='
      '"https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key">'
      'Helm Chart Production Guide</a> for more details.',
      category="warning",
      roles=["Admin"],
      html=True,
    )
  ]
  {{- end }}
  {{- end }}

# Enable RBAC (default on most clusters these days)
rbac:
  # Specifies whether RBAC resources should be created
  create: true
  createSCCRoleBinding: false

# Airflow executor
# Options: LocalExecutor, CeleryExecutor, KubernetesExecutor, CeleryKubernetesExecutor
executor: "KubernetesExecutor"

# If this is true and using LocalExecutor/KubernetesExecutor/CeleryKubernetesExecutor, the scheduler's
# service account will have access to communicate with the api-server and launch pods.
# If this is true and using CeleryExecutor/KubernetesExecutor/CeleryKubernetesExecutor, the workers
# will be able to launch pods.
allowPodLaunching: true

# Environment variables for all airflow containers
env: []
# - name: ""
#   value: ""

# Secrets for all airflow containers
secret: []
# - envName: ""
#   secretName: ""
#   secretKey: ""

# Enables selected built-in secrets that are set via environment variables by default.
# Those secrets are provided by the Helm Chart secrets by default but in some cases you
# might want to provide some of those variables with _CMD or _SECRET variable, and you should
# in this case disable setting of those variables by setting the relevant configuration to false.
enableBuiltInSecretEnvVars:
  AIRFLOW__CORE__FERNET_KEY: true
  AIRFLOW__CORE__SQL_ALCHEMY_CONN: true
  AIRFLOW_CONN_AIRFLOW_DB: true
  AIRFLOW__WEBSERVER__SECRET_KEY: true
  AIRFLOW__CELERY__CELERY_RESULT_BACKEND: true
  AIRFLOW__CELERY__RESULT_BACKEND: true
  AIRFLOW__CELERY__BROKER_URL: true
  AIRFLOW__ELASTICSEARCH__HOST: true
  AIRFLOW__ELASTICSEARCH__ELASTICSEARCH_HOST: true

# Extra secrets that will be managed by the chart
# (You can use them with extraEnv or extraEnvFrom or some of the extraVolumes values).
# The format is "key/value" where
#    * key (can be templated) is the name of the secret that will be created
#    * value: an object with the standard 'data' or 'stringData' key (or both).
#          The value associated with those keys must be a string (can be templated)
extraSecrets: {}
# eg:
# extraSecrets:
#   '{{ .Release.Name }}-airflow-connections':
#     type: 'Opaque'
#     data: |
#       AIRFLOW_CONN_GCP: 'base64_encoded_gcp_conn_string'
#       AIRFLOW_CONN_AWS: 'base64_encoded_aws_conn_string'
#     stringData: |
#       AIRFLOW_CONN_OTHER: 'other_conn'
#   '{{ .Release.Name }}-other-secret-name-suffix':
#     data: |
#        ...

# Extra ConfigMaps that will be managed by the chart
# (You can use them with extraEnv or extraEnvFrom or some of the extraVolumes values).
# The format is "key/value" where
#    * key (can be templated) is the name of the configmap that will be created
#    * value: an object with the standard 'data' key.
#          The value associated with this keys must be a string (can be templated)
extraConfigMaps: {}
# eg:
# extraConfigMaps:
#   '{{ .Release.Name }}-airflow-variables':
#     data: |
#       AIRFLOW_VAR_HELLO_MESSAGE: "Hi!"
#       AIRFLOW_VAR_KUBERNETES_NAMESPACE: "{{ .Release.Namespace }}"

# Extra env 'items' that will be added to the definition of airflow containers
# a string is expected (can be templated).
# TODO: difference from `env`? This is a templated string. Probably should template `env` and remove this.
extraEnv: ~
# eg:
# extraEnv: |
#   - name: AIRFLOW__CORE__LOAD_EXAMPLES
#     value: 'True'

# Extra envFrom 'items' that will be added to the definition of airflow containers
# A string is expected (can be templated).
extraEnvFrom: ~
# eg:
# extraEnvFrom: |
#   - secretRef:
#       name: '{{ .Release.Name }}-airflow-connections'
#   - configMapRef:
#       name: '{{ .Release.Name }}-airflow-variables'

# Airflow database & redis config
data:
  # If secret names are provided, use those secrets
  metadataSecretName: ~
  resultBackendSecretName: ~
  brokerUrlSecretName: ~

  # Otherwise pass connection values in
  metadataConnection:
    user: postgres
    pass: postgres
    protocol: postgresql
    host: ~
    port: 5432
    db: postgres
    sslmode: disable
  # resultBackendConnection defaults to the same database as metadataConnection
  resultBackendConnection: ~
  # or, you can use a different database
  # resultBackendConnection:
  #   user: postgres
  #   pass: postgres
  #   protocol: postgresql
  #   host: ~
  #   port: 5432
  #   db: postgres
  #   sslmode: disable
  # Note: brokerUrl can only be set during install, not upgrade
  brokerUrl: ~

# Fernet key settings
# Note: fernetKey can only be set during install, not upgrade
fernetKey: ~
fernetKeySecretName: ~

# Flask secret key for Airflow Webserver: `[webserver] secret_key` in airflow.cfg
webserverSecretKey: ~
webserverSecretKeySecretName: ~

# In order to use kerberos you need to create secret containing the keytab file
# The secret name should follow naming convention of the application where resources are
# name {{ .Release-name }}-<POSTFIX>. In case of the keytab file, the postfix is "kerberos-keytab"
# So if your release is named "my-release" the name of the secret should be "my-release-kerberos-keytab"
#
# The Keytab content should be available in the "kerberos.keytab" key of the secret.
#
#  apiVersion: v1
#  kind: Secret
#  data:
#    kerberos.keytab: <base64_encoded keytab file content>
#  type: Opaque
#
#
#  If you have such keytab file you can do it with similar
#
#  kubectl create secret generic {{ .Release.name }}-kerberos-keytab --from-file=kerberos.keytab
#
#
#  Alternatively, instead of manually creating the secret, it is possible to specify
#  kerberos.keytabBase64Content parameter. This parameter should contain base64 encoded keytab.
#

kerberos:
  enabled: false
  ccacheMountPath: /var/kerberos-ccache
  ccacheFileName: cache
  configPath: /etc/krb5.conf
  keytabBase64Content: ~
  keytabPath: /etc/airflow.keytab
  principal: airflow@FOO.COM
  reinitFrequency: 3600
  config: |
    # This is an example config showing how you can use templating and how "example" config
    # might look like. It works with the test kerberos server that we are using during integration
    # testing at Apache Airflow (see `scripts/ci/docker-compose/integration-kerberos.yml` but in
    # order to make it production-ready you must replace it with your own configuration that
    # Matches your kerberos deployment. Administrators of your Kerberos instance should
    # provide the right configuration.

    [logging]
    default = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_libs.log"
    kdc = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_kdc.log"
    admin_server = "FILE:{{ template "airflow_logs_no_quote" . }}/kadmind.log"

    [libdefaults]
    default_realm = FOO.COMdags
    ticket_lifetime = 10h
    renew_lifetime = 7d
    forwardable = true

    [realms]
    FOO.COM = {
      kdc = kdc-server.foo.com
      admin_server = admin_server.foo.com
    }

# Airflow Worker Config
workers:
  # Number of airflow celery workers in StatefulSet
  replicas: 1

  # Command to use when running Airflow workers (templated).
  command: ~
  # Args to use when running Airflow workers (templated).
  args:
    - "bash"
    - "-c"
    # The format below is necessary to get `helm lint` happy
    - |-
      exec \
      airflow {{ semverCompare ">=2.0.0" .Values.airflowVersion | ternary "celery worker" "worker" }}

  # Update Strategy when worker is deployed as a StatefulSet
  updateStrategy: ~
  # Update Strategy when worker is deployed as a Deployment
  strategy:
    rollingUpdate:
      maxSurge: "100%"
      maxUnavailable: "50%"

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to worker kubernetes service account.
    annotations: {}

  # Allow KEDA autoscaling.
  # Persistence.enabled must be set to false to use KEDA.
  keda:
    enabled: false
    namespaceLabels: {}

    # How often KEDA polls the airflow DB to report new scale requests to the HPA
    pollingInterval: 5

    # How many seconds KEDA will wait before scaling to zero.
    # Note that HPA has a separate cooldown period for scale-downs
    cooldownPeriod: 30

    # Minimum number of workers created by keda
    minReplicaCount: 0

    # Maximum number of workers created by keda
    maxReplicaCount: 10

  persistence:
    # Enable persistent volumes
    enabled: true
    # Volume size for worker StatefulSet
    size: 100Gi
    # If using a custom storageClass, pass name ref to all statefulSets here
    storageClassName:
    # Execute init container to chown log directory.
    # This is currently only needed in kind, due to usage
    # of local-path provisioner.
    fixPermissions: false

  kerberosSidecar:
    # Enable kerberos sidecar
    enabled: false
    resources: {}
    #  limits:
    #   cpu: 100m
    #   memory: 128Mi
    #  requests:
    #   cpu: 100m
    #   memory: 128Mi

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

  # Grace period for tasks to finish after SIGTERM is sent from kubernetes
  terminationGracePeriodSeconds: 600

  # This setting tells kubernetes that its ok to evict
  # when it wants to scale a node down.
  safeToEvict: true

  # Launch additional containers into worker.
  # Note: If used with KubernetesExecutor, you are responsible for signaling sidecars to exit when the main
  # container finishes so Airflow can continue the worker shutdown process!
  extraContainers: []
  # Add additional init containers into workers.
  extraInitContainers: []

  # Mount additional volumes into worker.
  extraVolumes: []
  extraVolumeMounts: []

  # Select certain nodes for airflow worker pods.
  nodeSelector: {}
  affinity: {}
  # default worker affinity is:
  #  podAntiAffinity:
  #    preferredDuringSchedulingIgnoredDuringExecution:
  #    - podAffinityTerm:
  #        labelSelector:
  #          matchLabels:
  #            component: worker
  #        topologyKey: kubernetes.io/hostname
  #      weight: 100
  tolerations: []
  # hostAliases to use in worker pods.
  # See:
  # https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/
  hostAliases: []
  # - ip: "127.0.0.2"
  #   hostnames:
  #   - "test.hostname.one"
  # - ip: "127.0.0.3"
  #   hostnames:
  #   - "test.hostname.two"

  podAnnotations: {}

  logGroomerSidecar:
    # Command to use when running the Airflow worker log groomer sidecar (templated).
    command: ~
    # Args to use when running the Airflow worker log groomer sidecar (templated).
    args: ["bash", "/clean-logs"]
    # Number of days to retain logs
    retentionDays: 15
    resources: {}
    #  limits:
    #   cpu: 100m
    #   memory: 128Mi
    #  requests:
    #   cpu: 100m
    #   memory: 128Mi

# Airflow scheduler settings
scheduler:
  # If the scheduler stops heartbeating for 5 minutes (5*60s) kill the
  # scheduler and let Kubernetes restart it
  livenessProbe:
    initialDelaySeconds: 10
    timeoutSeconds: 20
    failureThreshold: 5
    periodSeconds: 60
    # exec:
    #   command: ["/bin/bash","-c","ls -la /home/airflow"]
  # Airflow 2.0 allows users to run multiple schedulers,
  # However this feature is only recommended for MySQL 8+ and Postgres
  replicas: 1

  # Command to use when running the Airflow scheduler (templated).
  command: ~

  # Args to use when running the Airflow scheduler (templated).
  args: ["bash", "-c", "exec airflow scheduler"]

  # Update Strategy when scheduler is deployed as a StatefulSet
  # (when using LocalExecutor and workers.persistence)
  updateStrategy: ~
  # Update Strategy when scheduler is deployed as a Deployment
  # (when not using LocalExecutor and workers.persistence)
  strategy: ~

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to scheduler kubernetes service account.
    annotations: {}

  # Scheduler pod disruption budget
  podDisruptionBudget:
    enabled: false

    # PDB configuration
    config:
      maxUnavailable: 1

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

  # This setting tells kubernetes that its ok to evict
  # when it wants to scale a node down.
  safeToEvict: true

  # Launch additional containers into scheduler.
  extraContainers:
    - name: git-sync
      image: 'k8s.gcr.io/git-sync/git-sync:v3.6.1'
      env:
        - name: GIT_SYNC_USERNAME
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_USERNAME
        - name: GIT_SYNC_PASSWORD
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_PASSWORD
        - name: GIT_SYNC_REV
          value: HEAD
        - name: GIT_SYNC_BRANCH
          value: main
        - name: GIT_SYNC_REPO
          value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-pbr-dags.git'
        - name: GIT_SYNC_DEPTH
          value: '1'
        - name: GIT_SYNC_ROOT
          value: /git
        - name: GIT_SYNC_DEST
          value: repo
        - name: GIT_SYNC_ADD_USER
          value: 'true'
        - name: GIT_SYNC_WAIT
          value: '60'
        - name: GIT_SYNC_MAX_SYNC_FAILURES
          value: '0'
        - name: GIT_SYNC_LINK
          value: /tmp/git
      resources: { }
      volumeMounts:
        - name: dags
          mountPath: /git
        # - name: kube-api-access-9j29g
        #   readOnly: true
        #   mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      imagePullPolicy: IfNotPresent
      securityContext:
        runAsUser: 65533
    - name: git-sync-2
      image: 'k8s.gcr.io/git-sync/git-sync:v3.6.1'
      env:
        - name: GIT_SYNC_USERNAME
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_USERNAME
        - name: GIT_SYNC_PASSWORD
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_PASSWORD
        - name: GIT_SYNC_REV
          value: HEAD
        - name: GIT_SYNC_BRANCH
          value: main
        - name: GIT_SYNC_REPO
          value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-dags-2.git' 
        - name: GIT_SYNC_DEPTH
          value: '1'
        - name: GIT_SYNC_ROOT
          value: /git
        - name: GIT_SYNC_DEST
          value: repo
        - name: GIT_SYNC_ADD_USER
          value: 'true'
        - name: GIT_SYNC_WAIT
          value: '60'
        - name: GIT_SYNC_MAX_SYNC_FAILURES
          value: '0'
        # - name: GIT_SYNC_EXECHOOK_COMMAND
        #   value: 'ln -s /git/repo/airflow /repo'
      resources: {}
      volumeMounts:
        - name: dags2
          mountPath: /git
          # readOnly: true
        # - name: kube-api-access-9j29g
        #   readOnly: true
        #   mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      imagePullPolicy: IfNotPresent
      securityContext:
        runAsUser: 65533
  # Add additional init containers into scheduler.
  # extraInitContainers:
  #   - name: git-sync-init
  #     image: 'k8s.gcr.io/git-sync/git-sync:v3.3.0'
  #     env:
  #       - name: GIT_SYNC_USERNAME
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_USERNAME
  #       - name: GIT_SYNC_PASSWORD
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_PASSWORD
  #       - name: GIT_SYNC_REV
  #         value: HEAD
  #       - name: GIT_SYNC_BRANCH
  #         value: main
  #       - name: GIT_SYNC_REPO
  #         value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-pbr-dags.git'
  #       - name: GIT_SYNC_DEPTH
  #         value: '1'
  #       - name: GIT_SYNC_ROOT
  #         value: /git
  #       - name: GIT_SYNC_DEST
  #         value: repo
  #       - name: GIT_SYNC_ADD_USER
  #         value: 'true'
  #       - name: GIT_SYNC_WAIT
  #         value: '60'
  #       - name: GIT_SYNC_MAX_SYNC_FAILURES
  #         value: '0'
  #     resources: { }
  #     volumeMounts:
  #       - name: dags
  #         mountPath: /git
  #     imagePullPolicy: IfNotPresent
  #     securityContext:
  #       runAsUser: 65533

  #   - name: git-sync-2-init
  #     image: 'k8s.gcr.io/git-sync/git-sync:v3.3.0'
  #     env:
  #       - name: GIT_SYNC_USERNAME
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_USERNAME
  #       - name: GIT_SYNC_PASSWORD
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_PASSWORD
  #       - name: GIT_SYNC_REV
  #         value: HEAD
  #       - name: GIT_SYNC_BRANCH
  #         value: main
  #       - name: GIT_SYNC_REPO
  #         value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-dags-2.git' 
  #       - name: GIT_SYNC_DEPTH
  #         value: '1'
  #       - name: GIT_SYNC_ROOT
  #         value: /git
  #       - name: GIT_SYNC_DEST
  #         value: repo
  #       - name: GIT_SYNC_ADD_USER
  #         value: 'true'
  #       - name: GIT_SYNC_WAIT
  #         value: '60'
  #       - name: GIT_SYNC_MAX_SYNC_FAILURES
  #         value: '0'
  #     resources: {}
  #     volumeMounts:
  #       - name: dags2
  #         mountPath: /git2
  #       # - name: kube-api-access-9j29g
  #       #   readOnly: true
  #       #   mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  #     imagePullPolicy: IfNotPresent
  #     securityContext:
  #       runAsUser: 65533

  # Mount additional volumes into scheduler.
  extraVolumes:
    - name: dags
      emptyDir: {}
    - name: dags2
      emptyDir: {}

  extraVolumeMounts:
    - name: dags2
      readOnly: true
      mountPath: /opt/airflow/dags/git2
    - name : dags
      readOnly: true
      mountPath: /opt/airflow/dags/git1
  # Select certain nodes for airflow scheduler pods.
  nodeSelector: {}
  affinity: {}
  # default scheduler affinity is:
  #  podAntiAffinity:
  #    preferredDuringSchedulingIgnoredDuringExecution:
  #    - podAffinityTerm:
  #        labelSelector:
  #          matchLabels:
  #            component: scheduler
  #        topologyKey: kubernetes.io/hostname
  #      weight: 100
  tolerations: []

  podAnnotations: {}

  logGroomerSidecar:
    # Whether to deploy the Airflow scheduler log groomer sidecar.
    enabled: true
    # Command to use when running the Airflow scheduler log groomer sidecar (templated).
    command: ~
    # Args to use when running the Airflow scheduler log groomer sidecar (templated).
    args: ["bash", "/clean-logs"]
    # Number of days to retain logs
    retentionDays: 15
    resources: {}
    #  limits:
    #   cpu: 100m
    #   memory: 128Mi
    #  requests:
    #   cpu: 100m
    #   memory: 128Mi

# Airflow create user job settings
createUserJob:
  # Annotations on the create user job pod
  annotations: {}
  # jobAnnotations are annotations on the create user job
  jobAnnotations: {}

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to create user kubernetes service account.
    annotations: {}

  # Launch additional containers into user creation job
  extraContainers: []

  # Mount additional volumes into user creation job
  extraVolumes: []
  extraVolumeMounts: []

  nodeSelector: {}
  affinity: {}
  tolerations: []
  # In case you need to disable the helm hooks that create the jobs after install.
  # Disable this if you are using ArgoCD for example
  useHelmHooks: true

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

# Airflow database migration job settings
migrateDatabaseJob:
  # Annotations on the database migration pod
  annotations: {}
  # jobAnnotations are annotations on the database migration job
  jobAnnotations: {}

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to migrate database job kubernetes service account.
    annotations: {}

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

  # Launch additional containers into database migration job
  extraContainers: []

  # Mount additional volumes into database migration job
  extraVolumes: []
  extraVolumeMounts: []

  nodeSelector: {}
  affinity: {}
  tolerations: []
  # In case you need to disable the helm hooks that create the jobs after install.
  # Disable this if you are using ArgoCD for example
  useHelmHooks: true

# Airflow webserver settings
webserver:
  allowPodLogReading: true
  livenessProbe:
    initialDelaySeconds: 15
    timeoutSeconds: 30
    failureThreshold: 20
    periodSeconds: 5

  readinessProbe:
    initialDelaySeconds: 15
    timeoutSeconds: 30
    failureThreshold: 20
    periodSeconds: 5

  # Number of webservers
  replicas: 1

  # Command to use when running the Airflow webserver (templated).
  command: ~
  # Args to use when running the Airflow webserver (templated).
  args: ["bash", "-c", "exec airflow webserver"]

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to webserver kubernetes service account.
    annotations: {}

  # Allow overriding Update Strategy for Webserver
  strategy: ~

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  # Additional network policies as needed (Deprecated - renamed to `webserver.networkPolicy.ingress.from`)
  extraNetworkPolicies: []
  networkPolicy:
    ingress:
      # Peers for webserver NetworkPolicy ingress
      from: []
      # Ports for webserver NetworkPolicy ingress (if `from` is set)
      ports:
        - port: "{{ .Values.ports.airflowUI }}"

  resources: {}
  #   limits:
  #     cpu: 100m
  #     memory: 128Mi
  #   requests:
  #     cpu: 100m
  #     memory: 128Mi

  # Create initial user.
  defaultUser:
    enabled: true
    role: Admin
    username: admin
    email: admin@example.com
    firstName: admin
    lastName: user
    password: admin

  # Launch additional containers into webserver.
  extraContainers: []
  # Add additional init containers into webserver.
  extraInitContainers: []

  # Mount additional volumes into webserver.
  extraVolumes: []
  extraVolumeMounts: []

  # This string (can be templated) will be mounted into the Airflow Webserver as a custom
  # webserver_config.py. You can bake a webserver_config.py in to your image instead.
  webserverConfig: ~
  # webserverConfig: |
  #   from airflow import configuration as conf

  #   # The SQLAlchemy connection string.
  #   SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

  #   # Flask-WTF flag for CSRF
  #   CSRF_ENABLED = True

  service:
    type: NodePort
    ## service annotations
    annotations: {}
    ports:
      - name: airflow-ui
        port: "{{ .Values.ports.airflowUI }}"
    # To change the port used to access the webserver:
    # ports:
    #   - name: airflow-ui
    #     port: 80
    #     targetPort: airflow-ui
    # To only expose a sidecar, not the webserver directly:
    # ports:
    #   - name: only_sidecar
    #     port: 80
    #     targetPort: 8888
    loadBalancerIP: ~
    ## Limit load balancer source ips to list of CIDRs
    # loadBalancerSourceRanges:
    #   - "10.123.0.0/16"
    loadBalancerSourceRanges: []

  # Select certain nodes for airflow webserver pods.
  nodeSelector: {}
  affinity: {}
  # default webserver affinity is:
  #  podAntiAffinity:
  #    preferredDuringSchedulingIgnoredDuringExecution:
  #    - podAffinityTerm:
  #        labelSelector:
  #          matchLabels:
  #            component: webserver
  #        topologyKey: kubernetes.io/hostname
  #      weight: 100
  tolerations: []

  podAnnotations: {}

# Airflow Triggerer Config
triggerer:
  enabled: true
  # Number of airflow triggerers in the deployment
  replicas: 1

  # Command to use when running Airflow triggerers (templated).
  command: ~
  # Args to use when running Airflow triggerer (templated).
  args: ["bash", "-c", "exec airflow triggerer"]

  # Update Strategy for triggerers
  strategy:
    rollingUpdate:
      maxSurge: "100%"
      maxUnavailable: "50%"

  # If the triggerer stops heartbeating for 5 minutes (5*60s) kill the
  # triggerer and let Kubernetes restart it
  livenessProbe:
    initialDelaySeconds: 10
    timeoutSeconds: 20
    failureThreshold: 5
    periodSeconds: 60

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to triggerer kubernetes service account.
    annotations: {}

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

  # Grace period for triggerer to finish after SIGTERM is sent from kubernetes
  terminationGracePeriodSeconds: 60

  # This setting tells kubernetes that its ok to evict
  # when it wants to scale a node down.
  safeToEvict: true

  extraContainers:
    - name: git-sync
      image: 'k8s.gcr.io/git-sync/git-sync:v3.6.1'
      env:
        - name: GIT_SYNC_USERNAME
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_USERNAME
        - name: GIT_SYNC_PASSWORD
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_PASSWORD
        - name: GIT_SYNC_REV
          value: HEAD
        - name: GIT_SYNC_BRANCH
          value: main
        - name: GIT_SYNC_REPO
          value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-pbr-dags.git'
        - name: GIT_SYNC_DEPTH
          value: '1'
        - name: GIT_SYNC_ROOT
          value: /git
        - name: GIT_SYNC_DEST
          value: repo
        - name: GIT_SYNC_ADD_USER
          value: 'true'
        - name: GIT_SYNC_WAIT
          value: '60'
        - name: GIT_SYNC_MAX_SYNC_FAILURES
          value: '0'
      resources: { }
      volumeMounts:
        - name: dags
          mountPath: /git
        # - name: kube-api-access-9j29g
        #   readOnly: true
        #   mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      imagePullPolicy: IfNotPresent
      securityContext:
        runAsUser: 65533
    - name: git-sync-2
      image: 'k8s.gcr.io/git-sync/git-sync:v3.3.0'
      env:
        - name: GIT_SYNC_USERNAME
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_USERNAME
        - name: GIT_SYNC_PASSWORD
          valueFrom:
            secretKeyRef:
              name: git-credentials-2
              key: GIT_SYNC_PASSWORD
        - name: GIT_SYNC_REV
          value: HEAD
        - name: GIT_SYNC_BRANCH
          value: main
        - name: GIT_SYNC_REPO
          value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-dags-2.git' 
        - name: GIT_SYNC_DEPTH
          value: '1'
        - name: GIT_SYNC_ROOT
          value: /git
        - name: GIT_SYNC_DEST
          value: repo
        - name: GIT_SYNC_ADD_USER
          value: 'true'
        - name: GIT_SYNC_WAIT
          value: '60'
        - name: GIT_SYNC_MAX_SYNC_FAILURES
          value: '0'
      resources: {}
      volumeMounts:
        - name: dags2
          mountPath: /git
      imagePullPolicy: IfNotPresent
      securityContext:
        runAsUser: 65533
  # Add additional init containers into trigger.

  # extraInitContainers:
  #   - name: git-sync-init
  #     image: 'k8s.gcr.io/git-sync/git-sync:v3.3.0'
  #     env:
  #       - name: GIT_SYNC_USERNAME
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_USERNAME
  #       - name: GIT_SYNC_PASSWORD
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_PASSWORD
  #       - name: GIT_SYNC_REV
  #         value: HEAD
  #       - name: GIT_SYNC_BRANCH
  #         value: main
  #       - name: GIT_SYNC_REPO
  #         value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-pbr-dags.git'
  #       - name: GIT_SYNC_DEPTH
  #         value: '1'
  #       - name: GIT_SYNC_ROOT
  #         value: /git
  #       - name: GIT_SYNC_DEST
  #         value: repo
  #       - name: GIT_SYNC_ADD_USER
  #         value: 'true'
  #       - name: GIT_SYNC_WAIT
  #         value: '60'
  #       - name: GIT_SYNC_MAX_SYNC_FAILURES
  #         value: '0'
  #     resources: { }
  #     volumeMounts:
  #       - name: dags
  #         mountPath: /git
  #     imagePullPolicy: IfNotPresent
  #     securityContext:
  #       runAsUser: 65533

  #   - name: git-sync-2-init
  #     image: 'k8s.gcr.io/git-sync/git-sync:v3.3.0'
  #     env:
  #       - name: GIT_SYNC_USERNAME
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_USERNAME
  #       - name: GIT_SYNC_PASSWORD
  #         valueFrom:
  #           secretKeyRef:
  #             name: git-credentials-2
  #             key: GIT_SYNC_PASSWORD
  #       - name: GIT_SYNC_REV
  #         value: HEAD
  #       - name: GIT_SYNC_BRANCH
  #         value: main
  #       - name: GIT_SYNC_REPO
  #         value: 'https://git.ni.dfki.de/ml_infrastructure/airflow-dags-2.git' 
  #       - name: GIT_SYNC_DEPTH
  #         value: '1'
  #       - name: GIT_SYNC_ROOT
  #         value: /git
  #       - name: GIT_SYNC_DEST
  #         value: repo
  #       - name: GIT_SYNC_ADD_USER
  #         value: 'true'
  #       - name: GIT_SYNC_WAIT
  #         value: '60'
  #       - name: GIT_SYNC_MAX_SYNC_FAILURES
  #         value: '0'
  #     resources: {}
  #     volumeMounts:
  #       - name: dags2
  #         mountPath: /git2
  #       # - name: kube-api-access-9j29g
  #       #   readOnly: true
  #       #   mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  #     imagePullPolicy: IfNotPresent
  #     securityContext:
  #       runAsUser: 65533

  # Mount additional volumes into triggerer.
  extraVolumes:
    - name: dags
      emptyDir: {}
    - name: dags2
      emptyDir: {}
  extraVolumeMounts:
    - name: dags2
      readOnly: true
      mountPath: /opt/airflow/dags/git2
      # subPath: repo
    - name: dags
      readOnly: true
      mountPath: /opt/airflow/dags/git1
      # subPath: repo
      # subPath: airflow/dags/kubernetes

  # Select certain nodes for airflow triggerer pods.
  nodeSelector: {}
  affinity: {}
  # default triggerer affinity is:
  #  podAntiAffinity:
  #    preferredDuringSchedulingIgnoredDuringExecution:
  #    - podAffinityTerm:
  #        labelSelector:
  #          matchLabels:
  #            component: triggerer
  #        topologyKey: kubernetes.io/hostname
  #      weight: 100
  tolerations: []

  podAnnotations: {}

# Flower settings
flower:
  # Enable flower.
  # If True, and using CeleryExecutor/CeleryKubernetesExecutor, will deploy flower app.
  enabled: true

  # Command to use when running flower (templated).
  command: ~
  # Args to use when running flower (templated).
  args:
    - "bash"
    - "-c"
    # The format below is necessary to get `helm lint` happy
    - |-
      exec \
      airflow {{ semverCompare ">=2.0.0" .Values.airflowVersion | ternary "celery flower" "flower" }}

  # Additional network policies as needed (Deprecated - renamed to `flower.networkPolicy.ingress.from`)
  extraNetworkPolicies: []
  networkPolicy:
    ingress:
      # Peers for flower NetworkPolicy ingress
      from: []
      # Ports for flower NetworkPolicy ingress (if ingressPeers is set)
      ports:
        - port: "{{ .Values.ports.flowerUI }}"

  resources: {}
  #   limits:
  #     cpu: 100m
  #     memory: 128Mi
  #   requests:
  #     cpu: 100m
  #     memory: 128Mi

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  fsGroup: 0
  #  runAsGroup: 0

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to worker kubernetes service account.
    annotations: {}

  # A secret containing the connection
  secretName: ~

  # Else, if username and password are set, create secret from username and password
  username: ~
  password: ~

  service:
    type: ClusterIP
    ## service annotations
    annotations: {}
    ports:
      - name: flower-ui
        port: "{{ .Values.ports.flowerUI }}"
    # To change the port used to access flower:
    # ports:
    #   - name: flower-ui
    #     port: 8080
    #     targetPort: flower-ui
    loadBalancerIP: ~
    ## Limit load balancer source ips to list of CIDRs
    # loadBalancerSourceRanges:
    #   - "10.123.0.0/16"
    loadBalancerSourceRanges: []

  # Launch additional containers into the flower pods.
  extraContainers: []
  # Mount additional volumes into the flower pods.
  extraVolumes: []

  # Select certain nodes for airflow flower pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []

  podAnnotations: {}

# Statsd settings
statsd:
  enabled: true

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to worker kubernetes service account.
    annotations: {}

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 65534
  #  fsGroup: 0
  #  runAsGroup: 0

  # Additional network policies as needed
  extraNetworkPolicies: []
  resources: {}
  #   limits:
  #     cpu: 100m
  #     memory: 128Mi
  #   requests:
  #     cpu: 100m
  #     memory: 128Mi

  service:
    extraAnnotations: {}

  # Select certain nodes for statsd pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []

  # Additional mappings for statsd exporter.
  extraMappings: []

  uid: 65534

# PgBouncer settings
pgbouncer:
  # Enable PgBouncer
  enabled: false
  # Command to use for PgBouncer(templated).
  command: ["pgbouncer", "-u", "nobody", "/etc/pgbouncer/pgbouncer.ini"]
  # Args to use for PgBouncer(templated).
  args: ~

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to worker kubernetes service account.
    annotations: {}

  # Additional network policies as needed
  extraNetworkPolicies: []

  # Pool sizes
  metadataPoolSize: 10
  resultBackendPoolSize: 5

  # Maximum clients that can connect to PgBouncer (higher = more file descriptors)
  maxClientConn: 100

  # supply the name of existing secret with pgbouncer.ini and users.txt defined
  # you can load them to a k8s secret like the one below
  #  apiVersion: v1
  #  kind: Secret
  #  metadata:
  #    name: pgbouncer-config-secret
  #  data:
  #     pgbouncer.ini: <base64_encoded pgbouncer.ini file content>
  #     users.txt: <base64_encoded users.txt file content>
  #  type: Opaque
  #
  #  configSecretName: pgbouncer-config-secret
  #
  configSecretName: ~

  # PgBouncer pod disruption budget
  podDisruptionBudget:
    enabled: false

    # PDB configuration
    config:
      maxUnavailable: 1

  # Limit the resources to PgBouncer.
  # When you specify the resource request the k8s scheduler uses this information to decide which node to
  # place the Pod on. When you specify a resource limit for a Container, the kubelet enforces those limits so
  # that the running container is not allowed to use more of that resource than the limit you set.
  # See: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
  # Example:
  #
  # resource:
  #   limits:
  #     cpu: 100m
  #     memory: 128Mi
  #   requests:
  #     cpu: 100m
  #     memory: 128Mi
  resources: {}

  service:
    extraAnnotations: {}

  # https://www.pgbouncer.org/config.html
  verbose: 0
  logDisconnections: 0
  logConnections: 0

  sslmode: "prefer"
  ciphers: "normal"

  ssl:
    ca: ~
    cert: ~
    key: ~

  # Add extra PgBouncer ini configuration in the databases section:
  # https://www.pgbouncer.org/config.html#section-databases
  extraIniMetadata: ~
  extraIniResultBackend: ~
  # Add extra general PgBouncer ini configuration: https://www.pgbouncer.org/config.html
  extraIni: ~

  # Mount additional volumes into pgbouncer.
  extraVolumes: []
  extraVolumeMounts: []

  # Select certain nodes for PgBouncer pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []

  uid: 65534

  metricsExporterSidecar:
    resources: {}
    #  limits:
    #   cpu: 100m
    #   memory: 128Mi
    #  requests:
    #   cpu: 100m
    #   memory: 128Mi
    sslmode: "disable"

# Configuration for the redis provisioned by the chart
redis:
  enabled: true
  terminationGracePeriodSeconds: 600

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to worker kubernetes service account.
    annotations: {}

  persistence:
    # Enable persistent volumes
    enabled: true
    # Volume size for worker StatefulSet
    size: 1Gi
    # If using a custom storageClass, pass name ref to all statefulSets here
    storageClassName:

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

  # If set use as redis secret. Make sure to also set data.brokerUrlSecretName value.
  passwordSecretName: ~

  # Else, if password is set, create secret with it,
  # Otherwise a new password will be generated on install
  # Note: password can only be set during install, not upgrade.
  password: ~

  # This setting tells kubernetes that its ok to evict
  # when it wants to scale a node down.
  safeToEvict: true

  # Select certain nodes for redis pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []

# Auth secret for a private registry
# This is used if pulling airflow images from a private registry
registry:
  secretName: ~

  # Example:
  # connection:
  #   user: ~
  #   pass: ~
  #   host: ~
  #   email: ~
  connection: {}

# Elasticsearch logging configuration
elasticsearch:
  # Enable elasticsearch task logging
  enabled: false
  # A secret containing the connection
  secretName: ~
  # Or an object representing the connection
  # Example:
  # connection:
  #   user: ~
  #   pass: ~
  #   host: ~
  #   port: ~
  connection: {}

# All ports used by chart
ports:
  flowerUI: 5555
  airflowUI: 8080
  workerLogs: 8793
  redisDB: 6379
  statsdIngest: 9125
  statsdScrape: 9102
  pgbouncer: 6543
  pgbouncerScrape: 9127

# Define any ResourceQuotas for namespace
quotas: {}

# Define default/max/min values for pods and containers in namespace
limits: []

# This runs as a CronJob to cleanup old pods.
cleanup:
  enabled: false
  # Run every 15 minutes
  schedule: "*/15 * * * *"
  # Command to use when running the cleanup cronjob (templated).
  command: ~
  # Args to use when running the cleanup cronjob (templated).
  args: ["bash", "-c", "exec airflow kubernetes cleanup-pods --namespace={{ .Release.Namespace }}"]

  # Select certain nodes for airflow cleanup pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []

  resources: {}
  #  limits:
  #   cpu: 100m
  #   memory: 128Mi
  #  requests:
  #   cpu: 100m
  #   memory: 128Mi

  # Create ServiceAccount
  serviceAccount:
    # Specifies whether a ServiceAccount should be created
    create: true
    # The name of the ServiceAccount to use.
    # If not set and create is true, a name is generated using the release name
    name: ~

    # Annotations to add to cleanup cronjob kubernetes service account.
    annotations: {}

  # When not set, the values defined in the global securityContext will be used
  securityContext: {}
  #  runAsUser: 50000
  #  runAsGroup: 0

# Configuration for postgresql subchart
# Not recommended for production
postgresql:
  enabled: true
  postgresqlPassword: postgres
  postgresqlUsername: postgres

# Config settings to go into the mounted airflow.cfg
#
# Please note that these values are passed through the `tpl` function, so are
# all subject to being rendered as go templates. If you need to include a
# literal `{{` in a value, it must be expressed like this:
#
#    a: '{{ "{{ not a template }}" }}'
#
# Do not set config containing secrets via plain text values, use Env Var or k8s secret object
# yamllint disable rule:line-length
config:
  core:
    dags_folder: '{{ include "airflow_dags" . }}'
    # This is ignored when used with the official Docker image
    load_examples: 'False'
    executor: '{{ .Values.executor }}'
    # For Airflow 1.10, backward compatibility; moved to [logging] in 2.0
    colored_console_log: 'False'
    remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
  # Authentication backend used for the experimental API
  api:
    auth_backend: airflow.api.auth.backend.deny_all
  logging:
    remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
    colored_console_log: 'False'
  metrics:
    statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}'
    statsd_port: 9125
    statsd_prefix: airflow
    statsd_host: '{{ printf "%s-statsd" .Release.Name }}'
  webserver:
    enable_proxy_fix: 'True'
    # For Airflow 1.10
    rbac: 'True'
  celery:
    worker_concurrency: 16
  scheduler:
    # statsd params included for Airflow 1.10 backward compatibility; moved to [metrics] in 2.0
    statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}'
    statsd_port: 9125
    statsd_prefix: airflow
    statsd_host: '{{ printf "%s-statsd" .Release.Name }}'
    # `run_duration` included for Airflow 1.10 backward compatibility; removed in 2.0.
    run_duration: 41460
  elasticsearch:
    json_format: 'True'
    log_id_template: "{dag_id}_{task_id}_{execution_date}_{try_number}"
  elasticsearch_configs:
    max_retries: 3
    timeout: 30
    retry_timeout: 'True'
  kerberos:
    keytab: '{{ .Values.kerberos.keytabPath }}'
    reinit_frequency: '{{ .Values.kerberos.reinitFrequency }}'
    principal: '{{ .Values.kerberos.principal }}'
    ccache: '{{ .Values.kerberos.ccacheMountPath }}/{{ .Values.kerberos.ccacheFileName }}'
  celery_kubernetes_executor:
    kubernetes_queue: 'kubernetes'
  kubernetes:
    namespace: '{{ .Release.Namespace }}'
    airflow_configmap: '{{ include "airflow_config" . }}'
    airflow_local_settings_configmap: '{{ include "airflow_config" . }}'
    pod_template_file: '{{ include "airflow_pod_template_file" . }}/pod_template_file.yaml'
    worker_container_repository: '{{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}'
    worker_container_tag: '{{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}'
    multi_namespace_mode: '{{ if .Values.multiNamespaceMode }}True{{ else }}False{{ end }}'
# yamllint enable rule:line-length

# Whether Airflow can launch workers and/or pods in multiple namespaces
# If true, it creates ClusterRole/ClusterRolebinding (with access to entire cluster)
multiNamespaceMode: false

# `podTemplate` is a templated string containing the contents of `pod_template_file.yaml` used for
# KubernetesExecutor workers. The default `podTemplate` will use normal `workers` configuration parameters
# (e.g. `workers.resources`). As such, you normally won't need to override this directly, however,
# you can still provide a completely custom `pod_template_file.yaml` if desired.
# If not set, a default one is created using `files/pod-template-file.kubernetes-helm-yaml`.
podTemplate: ~
# The following example is NOT functional, but meant to be illustrative of how you can provide a custom
# `pod_template_file`. You're better off starting with the default in
# `files/pod-template-file.kubernetes-helm-yaml` and modifying from there.
# We will set `priorityClassName` in this example:
# podTemplate: |
#   apiVersion: v1
#   kind: Pod
#   metadata:
#     name: dummy-name
#     labels:
#       tier: airflow
#       component: worker
#       release: {{ .Release.Name }}
#   spec:
#     priorityClassName: high-priority
#     containers:
#       - name: base
#         ...

# Git sync
dags:
  persistence:
    # Enable persistent volume for storing dags
    enabled: false
    # Volume size for dags
    size: 1Gi
    # If using a custom storageClass, pass name here
    storageClassName: local-storage
    # access mode of the persistent volume
    accessMode: ReadWriteMany
    ## the name of an existing PVC to use
    existingClaim: airflow-dags-pvc-volume
  gitSync:
    enabled: false

    # git repo clone url
    # ssh examples ssh://git@github.com/apache/airflow.git
    # git@github.com:apache/airflow.git
    # https example: https://github.com/apache/airflow.git
    repo: https://git.ni.dfki.de/ml_infrastructure/airflow-pbr-dags.git
    branch: git-sync
    rev: HEAD
    depth: 1
    # the number of consecutive failures allowed before aborting
    maxFailures: 0
    # subpath within the repo where dags are located
    # should be "" if dags are at repo root
    subPath: ""
    # if your repo needs a user name password
    # you can load them to a k8s secret like the one below
    #   ---
    #   apiVersion: v1
    #   kind: Secret
    #   metadata:
    #     name: git-credentials-2
    #   data:
    #     GIT_SYNC_USERNAME: <base64_encoded_git_username>
    #     GIT_SYNC_PASSWORD: <base64_encoded_git_password>
    # and specify the name of the secret below
    #
    credentialsSecret: git-credentials
    #
    #
    # If you are using an ssh clone url, you can load
    # the ssh private key to a k8s secret like the one below
    #   ---
    #   apiVersion: v1
    #   kind: Secret
    #   metadata:
    #     name: airflow-ssh-secret
    #   data:
    #     # key needs to be gitSshKey
    #     gitSshKey: <base64_encoded_data>
    # and specify the name of the secret below
    # sshKeySecret: airflow-ssh-secret
    #
    # If you are using an ssh private key, you can additionally
    # specify the content of your known_hosts file, example:
    #
    # knownHosts: |
    #    <host1>,<ip1> <key1>
    #    <host2>,<ip2> <key2>
    # interval between git sync attempts in seconds
    wait: 60
    containerName: git-sync
    uid: 65533

    # When not set, the values defined in the global securityContext will be used
    securityContext: {}
    #  runAsUser: 65533
    #  runAsGroup: 0

    extraVolumeMounts: []
    env: []
    resources: {}
    #  limits:
    #   cpu: 100m
    #   memory: 128Mi
    #  requests:
    #   cpu: 100m
    #   memory: 128Mi

logs:
  persistence:
    # Enable persistent volume for storing logs
    enabled: true
    # Volume size for logs
    size: 10Gi
    # If using a custom storageClass, pass name here
    storageClassName: local-storage
    ## the name of an existing PVC to use
    existingClaim: logs-pvc-volume

vl-kp commented 1 year ago

any update?

titowoche30 commented 1 year ago

That would be a really great feature

MISSEY commented 1 year ago

I was able to solve this by creating a git repository(let's say A) which has git submodules(B,C,D), and setup a trigger in git-submodules' CI/CD (on new commit) to trigger the job in git repository A. The CI/CD job in git repo A would be something like below :

  git submodule sync --recursive
  git submodule update --recursive --remote
  git add .
  CHANGES=$(git status --porcelain | wc -l)
    if [[ "${CHANGES}" -gt 0 ]]; then
      git config user.email "something@somethin.com"
      git config user.name "test"
      git commit -m "Submodule Sync"
      git push origin HEAD:main
    else
      echo "Nothing to commit"
      fi

And git-sync is syncing to git repo A.

snowsky commented 1 year ago

bitnami's airflow helm chart can support this feature

y0zg commented 1 year ago

@snowsky can you share the place where this is supported in bitnami? probably we can create PR against official airflow helm chart

snowsky commented 1 year ago

https://github.com/bitnami/charts/blob/main/bitnami/airflow/values.yaml#L956

Bitnami Chart can support multiple repos ^^^ :)

mk-raven commented 4 months ago

any update of this feature?

airflow-helm / charts

support multiple git-sync repos (for dags & plugins) #166