bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.84k stars 9.13k forks source link

upgrade apache-airflow-providers-cncf-kubernetes to pass image_pull_policy in KubernetesPodOperator correctly #6722

Closed grmoktan closed 3 years ago

grmoktan commented 3 years ago

Description

The apache-airflow-providers-cncf-kubernetes provider is 1.0.0 version (at least till helm chart version 10.2.1 ).

There seems to be bugs fixed since (https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html#id6) ... especially with regards to the image pull policy

Is there some way to custom upgrade the provider packages on existing helm chart?

Steps to reproduce the issue:

  1. create a dag with KubernetesPodOperator and image_pull_policy = 'Always'
  2. Run the dag
  3. check pod yaml in Kubernetes

Describe the results you received: imagePullPolicy: IfNotPresent

Describe the results you expected: imagePullPolicy: Always

Additional information you deem important (e.g. issue happens only occasionally):

carrodher commented 3 years ago

I just moved this issue from the bitnami/bitnami-docker-airflow that is specific for the container to bitnami/charts that is the repo where we provide support about Bitnami Helm Charts.

Unfortunately, I am not able to fully understand the issue. The bitnami/airflow Helm Chart contains five different images that are automatically updated when a new version is detected in the upstream project; that means when there is a new version released by the Airflow developers, our automated test & release pipeline that is tracking the upstream project download the source code, build the Bitnami container image, add them to the Bitnami Helm Chart and if all the tests work fine a new chart is released containing the latest version of the images. For example, this release was done some days ago, see https://github.com/bitnami/charts/commit/90cdff507ce247e2e96cf72ba45f0fc15601d555

The bundled images are airflow, airflow-scheduler, airflow-worker, git and airflow-exporter. For each one of the Airflow images, we are supporting branches 1.X and 2.X, but in the Helm Chart, we are including the latest one (2.X). The images can be customized by modifying the values.yaml or passing them as an argument to the helm install command by using the --set flag:

web:
  image:
    registry: docker.io
    repository: bitnami/airflow
    tag: 2.1.0-debian-10-r20
    pullPolicy: IfNotPresent

You can modify the above section in the values.yaml to use any other image as well as modify the default pullPolicy.

grmoktan commented 3 years ago

Hi, Thankyou for replying and explaining the build process.

Basically when I deploy this helm chart and exec into the container and run 'airflow providers list', I see all providers (including apache-airflow-providers-cncf-kubernetes) are in version 1.0.0 .

Scroll to rightmost column :

$ airflow providers list
package_name                              | description                                                                                     | version
==========================================+=================================================================================================+========
apache-airflow-providers-amazon           | Amazon integration (including Amazon Web Services (AWS) https://aws.amazon.com/)                | 1.0.0  
apache-airflow-providers-apache-cassandra | Apache Cassandra http://cassandra.apache.org/                                                   | 1.0.0  
apache-airflow-providers-apache-druid     | Apache Druid https://druid.apache.org/                                                          | 1.0.0  
apache-airflow-providers-apache-hdfs      | Hadoop Distributed File System (HDFS) https://hadoop.apache.org/docs/r1.2.1/hdfsdesign.html     | 1.0.0  
                                          | and WebHDFS https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html |        
apache-airflow-providers-apache-hive      | Apache Hive https://hive.apache.org/                                                            | 1.0.0  
apache-airflow-providers-apache-pinot     | Apache Pinot https://pinot.apache.org/                                                          | 1.0.0  
apache-airflow-providers-celery           | Celery http://www.celeryproject.org/                                                            | 1.0.0  
apache-airflow-providers-cloudant         | IBM Cloudant https://www.ibm.com/cloud/cloudant                                                 | 1.0.0  
apache-airflow-providers-cncf-kubernetes  | Kubernetes https://kubernetes.io/                                                               | 1.0.0  
apache-airflow-providers-docker           | Docker https://docs.docker.com/install/                                                         | 1.0.0  
apache-airflow-providers-elasticsearch    | Elasticsearch https://https//www.elastic.co/elasticsearch                                       | 1.0.0  
apache-airflow-providers-exasol           | Exasol https://docs.exasol.com/home.htm                                                         | 1.0.0  
apache-airflow-providers-ftp              | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114                                 | 1.0.0  
apache-airflow-providers-google           | Google services including:                                                                      | 1.0.0  
                                          |                                                                                                 |        
                                          |   - Google Ads https://ads.google.com/                                                          |        
                                          |   - Google Cloud (GCP) https://cloud.google.com/                                                |        
                                          |   - Google Firebase https://firebase.google.com/                                                |        
                                          |   - Google Marketing Platform https://marketingplatform.google.com/                             |        
                                          |   - Google Workspace https://workspace.google.pl/ (formerly Google Suite)                       |        
apache-airflow-providers-grpc             | gRPC https://grpc.io/                                                                           | 1.0.0  
apache-airflow-providers-hashicorp        | Hashicorp including Hashicorp Vault https://www.vaultproject.io/                                | 1.0.0  
apache-airflow-providers-http             | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/                                | 1.0.0  
apache-airflow-providers-imap             | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501                     | 1.0.0  
apache-airflow-providers-microsoft-azure  | Microsoft Azure https://azure.microsoft.com/                                                    | 1.0.0  
apache-airflow-providers-microsoft-mssql  | Microsoft SQL Server (MSSQL) https://www.microsoft.com/en-us/sql-server/sql-server-downloads    | 1.0.0  
apache-airflow-providers-mongo            | MongoDB https://www.mongodb.com/what-is-mongodb                                                 | 1.0.0  
apache-airflow-providers-mysql            | MySQL https://www.mysql.com/products/                                                           | 1.0.0  
apache-airflow-providers-postgres         | PostgreSQL https://www.postgresql.org/                                                          | 1.0.0  
apache-airflow-providers-presto           | Presto https://prestodb.github.io/                                                              | 1.0.0  
apache-airflow-providers-redis            | Redis https://redis.io/                                                                         | 1.0.0  
apache-airflow-providers-sendgrid         | Sendgrid https://sendgrid.com/                                                                  | 1.0.0  
apache-airflow-providers-sftp             | SSH File Transfer Protocol (SFTP) https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/    | 1.0.0  
apache-airflow-providers-slack            | Slack https://slack.com/                                                                        | 1.0.0  
apache-airflow-providers-sqlite           | SQLite https://www.sqlite.org/                                                                  | 1.0.0  
apache-airflow-providers-ssh              | Secure Shell (SSH) https://tools.ietf.org/html/rfc4251                                          | 1.0.0  
apache-airflow-providers-vertica          | Vertica https://www.vertica.com/                            

It might be that the upstream is stuck at 1.0.0 . I understood from the page(https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html#core-airflow-extras) that cncf.kubernetes and celery are preinstalled.

But since apache-airflow-providers-cncf-kubernetes==1.0.0 had bug fixes (as mentioned earlier , I'm interested to have the correct image_pull_policy passed), I was wondering if the new version/s of the provider package/s could be applied when you build the Bitnami container image. ( https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html#installation)

Or if there is some way to upgrade the package in a running container?

carrodher commented 3 years ago

I'm sorry but I am not able to reproduce the issue, I just deployed the latest version of the chart from scratch in a new namespace and the version of apache-airflow-providers-cncf-kubernetes is 1.2.0. Please, find below the steps I did in order to reproduce the scenario:

$ kubectl create namespace carlosrh-airflow
namespace/carlosrh-airflow created

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈

$ helm install airflow bitnami/airflow --namespace carlosrh-airflow
NAME: airflow
LAST DEPLOYED: Wed Jun 23 07:04:26 2021
NAMESPACE: carlosrh-airflow
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the Airflow URL by running:

  echo URL  : http://127.0.0.1:8080
  kubectl port-forward --namespace carlosrh-airflow svc/airflow 8080:8080

2. Get your Airflow login credentials by running:
  export AIRFLOW_PASSWORD=$(kubectl get secret --namespace "carlosrh-airflow" airflow -o jsonpath="{.data.airflow-password}" | base64 --decode)
  echo User:     user
  echo Password: $AIRFLOW_PASSWORD

$ helm ls --namespace carlosrh-airflow
NAME    NAMESPACE           REVISION    UPDATED                                 STATUS      CHART           APP VERSION
airflow carlosrh-airflow    1           2021-06-23 07:04:26.55510604 +0000 UTC  deployed    airflow-10.2.1  2.1.0

$ kubectl get pods --namespace carlosrh-airflow
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          5m14s
airflow-redis-master-0               1/1     Running   0          5m14s
airflow-scheduler-844fb97648-9hgpp   1/1     Running   0          5m14s
airflow-web-78f55b7c-mdq9x           1/1     Running   0          5m14s
airflow-worker-0                     1/1     Running   1          5m14s

$ kubectl --namespace carlosrh-airflow exec -it airflow-web-78f55b7c-mdq9x -- /bin/bash
I have no name!@airflow-web-78f55b7c-mdq9x:/$ airflow providers list
package_name                              | description                                                                                     | version
==========================================+=================================================================================================+========
apache-airflow-providers-amazon           | Amazon integration (including Amazon Web Services (AWS) https://aws.amazon.com/)                | 1.4.0
apache-airflow-providers-apache-cassandra | Apache Cassandra http://cassandra.apache.org/                                                   | 1.0.1
apache-airflow-providers-apache-druid     | Apache Druid https://druid.apache.org/                                                          | 1.1.0
apache-airflow-providers-apache-hdfs      | Hadoop Distributed File System (HDFS) https://hadoop.apache.org/docs/r1.2.1/hdfsdesign.html     | 1.0.1
                                          | and WebHDFS https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html |
apache-airflow-providers-apache-hive      | Apache Hive https://hive.apache.org/                                                            | 1.0.3
apache-airflow-providers-apache-pinot     | Apache Pinot https://pinot.apache.org/                                                          | 1.0.1
apache-airflow-providers-celery           | Celery http://www.celeryproject.org/                                                            | 1.0.1
apache-airflow-providers-cloudant         | IBM Cloudant https://www.ibm.com/cloud/cloudant                                                 | 1.0.1
apache-airflow-providers-cncf-kubernetes  | Kubernetes https://kubernetes.io/                                                               | 1.2.0
apache-airflow-providers-docker           | Docker https://docs.docker.com/install/                                                         | 1.2.0
apache-airflow-providers-elasticsearch    | Elasticsearch https://https//www.elastic.co/elasticsearch                                       | 1.0.4
apache-airflow-providers-exasol           | Exasol https://docs.exasol.com/home.htm                                                         | 1.1.1
apache-airflow-providers-ftp              | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114                                 | 1.1.0
apache-airflow-providers-google           | Google services including:                                                                      | 3.0.0
                                          |                                                                                                 |
                                          |   - Google Ads https://ads.google.com/                                                          |
                                          |   - Google Cloud (GCP) https://cloud.google.com/                                                |
                                          |   - Google Firebase https://firebase.google.com/                                                |
                                          |   - Google LevelDB https://github.com/google/leveldb/                                           |
                                          |   - Google Marketing Platform https://marketingplatform.google.com/                             |
                                          |   - Google Workspace https://workspace.google.pl/ (formerly Google Suite)                       |
apache-airflow-providers-grpc             | gRPC https://grpc.io/                                                                           | 1.1.0
apache-airflow-providers-hashicorp        | Hashicorp including Hashicorp Vault https://www.vaultproject.io/                                | 1.0.2
apache-airflow-providers-http             | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/                                | 1.1.1
apache-airflow-providers-imap             | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501                     | 1.0.1
apache-airflow-providers-microsoft-azure  | Microsoft Azure https://azure.microsoft.com/                                                    | 2.0.0
apache-airflow-providers-microsoft-mssql  | Microsoft SQL Server (MSSQL) https://www.microsoft.com/en-us/sql-server/sql-server-downloads    | 1.1.0
apache-airflow-providers-mongo            | MongoDB https://www.mongodb.com/what-is-mongodb                                                 | 1.0.1
apache-airflow-providers-mysql            | MySQL https://www.mysql.com/products/                                                           | 1.1.0
apache-airflow-providers-neo4j            | Neo4j https://neo4j.com/                                                                        | 1.0.1
apache-airflow-providers-postgres         | PostgreSQL https://www.postgresql.org/                                                          | 1.0.2
apache-airflow-providers-presto           | Presto https://prestodb.github.io/                                                              | 1.0.2
apache-airflow-providers-redis            | Redis https://redis.io/                                                                         | 1.0.1
apache-airflow-providers-sendgrid         | Sendgrid https://sendgrid.com/                                                                  | 1.0.2
apache-airflow-providers-sftp             | SSH File Transfer Protocol (SFTP) https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/    | 1.2.0
apache-airflow-providers-slack            | Slack https://slack.com/                                                                        | 3.0.0
apache-airflow-providers-sqlite           | SQLite https://www.sqlite.org/                                                                  | 1.0.2
apache-airflow-providers-ssh              | Secure Shell (SSH) https://tools.ietf.org/html/rfc4251                                          | 1.3.0
apache-airflow-providers-trino            | Trino https://trino.io/                                                                         | 1.0.0
apache-airflow-providers-vertica          | Vertica https://www.vertica.com/                                                                | 1.0.1

It is possible that part of the configuration is persisted in PVCs, so if you are installing the chart at different times in the same namespace and using the same name, previous configurations are being taken into account. Please, can you try installing the latest version from scratch? That means removing the existing PVCs or using a different namespace or a different name.

grmoktan commented 3 years ago

Thank you @carrodher . I deleted the PVCs and deployed on new namespace but same.

However, I was able to replicate your steps and see the newer versions of the provider packages too. This made me realise that I have a custom values file where the image tags are specified via deployment script. The image tags of each of the web, scheduler and worker in my were apparently not updated beyond 2.0.0 .

Changed them to '2.1.0-debian-10-r20' and now I can see the expected behaviour.

Thank you so much for the help.