bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.75k stars 9.06k forks source link

Adding additional dependencies in Bitnami/Airflow #16544

Open mavenzer opened 1 year ago

mavenzer commented 1 year ago

Name and Version

bitnami/airflow 2.5.1

What architecture are you using?

None

What steps will reproduce the bug?

Hi all, I have deployed Airflow 2.5.1 using helm charts of Bitnami in Openshift. We are using the standard chart and customizing with fsGroup in the charts as there are certain restrictions to the Openshift uid, gids.

We wanted to add additional python packages in the current deployment such as pydhb.

I have seen a similar issue: https://github.com/bitnami/charts/issues/9390

And how to test that the package is added in the current Deployment? Since the package will be added to web, scheduler, worker node.

oc exec -it <pod-name>  /bin/bash
source /opt/bitnami/airflow/venv/bin/activate | grep pyhdb

I'm unable to find the custom package there!

Are you using any custom parameters or values?

Helm command which I'm using for deploying it

helm upgrade --install airflow-helm <directory> -n <custom-namespace>  -f <directory/values.yaml> 

values.yaml that I'm using for the adding additional Python dependencies

web: &requirements-volume-config
  extraVolumeMounts: 
    - name: requirements
      mountPath: /bitnami/python/requirements.txt
      subPath: requirements.txt
  extraVolumes:
    - name: requirements
      configMap:
        name: airflow-requirements
scheduler: *requirements-volume-config
worker: *requirements-volume-config
extraDeploy:
  - apiVersion: v1
    kind: ConfigMap
    metadata:
      name: airflow-requirements
    data:
      requirements.txt: |
        --trusted-host artifactory.customdomain.org  pyhdb 

What do you see instead?

I'm unable to see the package installed when I'm using the following command:

oc exec -it <pod-name>  /bin/bash 

source /opt/bitnami/airflow/venv/bin/activate
pip list  | grep pyhdb 

Additional information

package which is present in the /opt/bitnami/airflow/venv/bin/activate

adal                                      1.2.7
aiofiles                                  22.1.0
aiohttp                                   3.8.3
aiosignal                                 1.3.1
alabaster                                 0.7.13
alembic                                   1.9.2
amqp                                      5.1.1
anyio                                     3.6.2
apache-airflow                            2.5.1
apache-airflow-providers-amazon           7.1.0
apache-airflow-providers-apache-cassandra 3.1.0
apache-airflow-providers-apache-drill     2.3.1
apache-airflow-providers-apache-druid     3.3.1
apache-airflow-providers-apache-hdfs      3.2.0
apache-airflow-providers-apache-hive      5.1.1
apache-airflow-providers-apache-pinot     4.0.1
apache-airflow-providers-arangodb         2.1.0
apache-airflow-providers-celery           3.1.0
apache-airflow-providers-cloudant         3.1.0
apache-airflow-providers-cncf-kubernetes  5.1.1
apache-airflow-providers-common-sql       1.3.3
apache-airflow-providers-databricks       4.0.0
apache-airflow-providers-docker           3.4.0
apache-airflow-providers-elasticsearch    4.3.3
apache-airflow-providers-exasol           4.1.3
apache-airflow-providers-ftp              3.3.0
apache-airflow-providers-google           8.8.0
apache-airflow-providers-grpc             3.1.0
apache-airflow-providers-hashicorp        3.2.0
apache-airflow-providers-http             4.1.1
apache-airflow-providers-imap             3.1.1
apache-airflow-providers-influxdb         2.1.0
apache-airflow-providers-microsoft-azure  5.1.0
apache-airflow-providers-microsoft-mssql  3.3.2
apache-airflow-providers-mongo            3.1.1
apache-airflow-providers-mysql            4.0.0
apache-airflow-providers-neo4j            3.2.1
apache-airflow-providers-postgres         5.4.0
apache-airflow-providers-presto           4.2.1
apache-airflow-providers-redis            3.1.0
apache-airflow-providers-sendgrid         3.1.0
apache-airflow-providers-sftp             4.2.1
apache-airflow-providers-slack            7.2.0
apache-airflow-providers-sqlite           3.3.1
apache-airflow-providers-ssh              3.4.0
apache-airflow-providers-trino            4.3.1
apache-airflow-providers-vertica          3.3.1
apispec                                   3.3.2
appdirs                                   1.4.4
argcomplete                               2.0.0
asgiref                                   3.6.0
asn1crypto                                1.5.1
astroid                                   2.11.7
asttokens                                 2.2.1
async-timeout                             4.0.2
asynctest                                 0.13.0
attrs                                     22.2.0
aws-sam-translator                        1.57.0
aws-xray-sdk                              2.11.0
azure-batch                               13.0.0
azure-common                              1.1.28
azure-core                                1.26.2
azure-cosmos                              4.3.0
azure-datalake-store                      0.0.52
azure-identity                            1.12.0
azure-keyvault-secrets                    4.6.0
azure-kusto-data                          0.0.45
azure-mgmt-containerinstance              1.5.0
azure-mgmt-core                           1.3.2
azure-mgmt-datafactory                    1.1.0
azure-mgmt-datalake-nspkg                 3.0.1
azure-mgmt-datalake-store                 0.5.0
azure-mgmt-nspkg                          3.0.2
azure-mgmt-resource                       22.0.0
azure-nspkg                               3.0.2
azure-servicebus                          7.8.2
azure-storage-blob                        12.14.1
azure-storage-common                      2.1.0
azure-storage-file                        2.1.0
azure-storage-file-datalake               12.9.1
azure-synapse-spark                       0.7.0
Babel                                     2.11.0
backcall                                  0.2.0
backoff                                   1.10.0
bcrypt                                    4.0.1
beautifulsoup4                            4.11.1
billiard                                  3.6.4.0
black                                     23.1a1
bleach                                    5.0.1
blinker                                   1.5
boto                                      2.49.0
boto3                                     1.26.51
botocore                                  1.29.51
bowler                                    0.9.0
cachelib                                  0.9.0
cachetools                                4.2.2
cassandra-driver                          3.25.0
cattrs                                    22.2.0
celery                                    5.2.7
certifi                                   2022.12.7
cffi                                      1.15.1
cfgv                                      3.3.1
cfn-lint                                  0.72.9
cgroupspy                                 0.2.2
chardet                                   4.0.0
charset-normalizer                        2.1.1
checksumdir                               1.2.0
ciso8601                                  2.3.0
click                                     8.1.3
click-default-group                       1.2.2
click-didyoumean                          0.3.0
click-plugins                             1.1.1
click-repl                                0.2.0
clickclick                                20.10.2
cloudant                                  2.15.0
cloudpickle                               2.2.0
colorama                                  0.4.6
colorlog                                  4.8.0
commonmark                                0.9.1
ConfigUpdater                             3.1.1
connexion                                 2.14.1
coverage                                  7.0.5
cron-descriptor                           1.2.32
croniter                                  1.3.8
cryptography                              38.0.4
dask                                      2023.1.0
databricks-sql-connector                  2.2.0
db-dtypes                                 1.0.5
decorator                                 5.1.1
defusedxml                                0.7.1
Deprecated                                1.2.13
dill                                      0.3.1.1
distlib                                   0.3.6
distributed                               2023.1.0
dnspython                                 2.3.0
docker                                    6.0.1
docopt                                    0.6.2
docutils                                  0.19
ecdsa                                     0.18.0
elasticsearch                             7.13.4
elasticsearch-dbapi                       0.2.9
elasticsearch-dsl                         7.4.0
email-validator                           1.3.0
eralchemy2                                1.3.6
eventlet                                  0.33.3
exceptiongroup                            1.1.0
execnet                                   1.9.0
executing                                 1.2.0
fastavro                                  1.7.0
filelock                                  3.9.0
fissix                                    21.11.13
flake8                                    6.0.0
flake8-colors                             0.1.9
flake8_implicit_str_concat                0.3.0
flaky                                     3.7.0
Flask                                     2.2.2
Flask-AppBuilder                          4.1.4
Flask-Babel                               2.0.0
Flask-Bcrypt                              1.0.1
Flask-Caching                             2.0.2
Flask-JWT-Extended                        4.4.4
Flask-Login                               0.6.2
Flask-Session                             0.4.0
Flask-SQLAlchemy                          2.5.1
Flask-WTF                                 1.1.1
flower                                    1.2.0
freezegun                                 1.2.2
frozenlist                                1.3.3
fsspec                                    2022.11.0
future                                    0.18.3
gcloud-aio-auth                           4.1.5
gcloud-aio-bigquery                       6.2.0
gcloud-aio-storage                        8.0.0
geomet                                    0.2.1.post1
gevent                                    22.10.2
gitdb                                     4.0.10
GitPython                                 3.1.30
google-ads                                18.0.0
google-api-core                           2.8.2
google-api-python-client                  1.12.11
google-auth                               2.16.0
google-auth-httplib2                      0.1.0
google-auth-oauthlib                      0.8.0
google-cloud-aiplatform                   1.16.1
google-cloud-appengine-logging            1.1.3
google-cloud-audit-log                    0.2.4
google-cloud-automl                       2.8.0
google-cloud-bigquery                     2.34.4
google-cloud-bigquery-datatransfer        3.7.0
google-cloud-bigquery-storage             2.14.1
google-cloud-bigtable                     1.7.3
google-cloud-build                        3.9.0
google-cloud-compute                      0.7.0
google-cloud-container                    2.11.1
google-cloud-core                         2.3.2
google-cloud-datacatalog                  3.9.0
google-cloud-dataform                     0.2.0
google-cloud-dataplex                     1.1.0
google-cloud-dataproc                     5.0.0
google-cloud-dataproc-metastore           1.6.0
google-cloud-dlp                          1.0.2
google-cloud-kms                          2.12.0
google-cloud-language                     1.3.2
google-cloud-logging                      3.2.1
google-cloud-memcache                     1.4.1
google-cloud-monitoring                   2.11.0
google-cloud-orchestration-airflow        1.4.1
google-cloud-os-login                     2.7.1
google-cloud-pubsub                       2.13.5
google-cloud-redis                        2.9.0
google-cloud-resource-manager             1.6.0
google-cloud-secret-manager               1.0.2
google-cloud-spanner                      1.19.3
google-cloud-speech                       1.3.4
google-cloud-storage                      2.7.0
google-cloud-tasks                        2.10.1
google-cloud-texttospeech                 1.0.3
google-cloud-translate                    1.7.2
google-cloud-videointelligence            1.16.3
google-cloud-vision                       1.0.2
google-cloud-workflows                    1.7.1
google-crc32c                             1.5.0
google-resumable-media                    2.4.0
googleapis-common-protos                  1.56.4
graphql-core                              3.2.3
graphviz                                  0.20.1
greenlet                                  2.0.1
grpc-google-iam-v1                        0.12.4
grpcio                                    1.51.1
grpcio-gcp                                0.2.2
grpcio-status                             1.48.2
gunicorn                                  20.1.0
h11                                       0.14.0
hdfs                                      2.7.0
HeapDict                                  1.0.1
hmsclient                                 0.1.1
httpcore                                  0.16.3
httplib2                                  0.20.4
httpx                                     0.23.3
humanize                                  4.4.0
hvac                                      1.0.2
identify                                  2.5.13
idna                                      3.4
ijson                                     3.2.0.post0
imagesize                                 1.4.1
importlib-metadata                        6.0.0
incremental                               22.10.0
inflection                                0.5.1
influxdb-client                           1.35.0
iniconfig                                 2.0.0
ipdb                                      0.13.11
ipython                                   8.8.0
isodate                                   0.6.1
isort                                     5.11.2
itsdangerous                              2.1.2
jaraco.classes                            3.2.3
jedi                                      0.18.2
jeepney                                   0.8.0
Jinja2                                    3.1.2
jira                                      3.4.1
jmespath                                  0.10.0
jschema-to-python                         1.2.3
json-merge-patch                          0.2
jsondiff                                  2.0.0
jsonpatch                                 1.32
jsonpath-ng                               1.5.3
jsonpickle                                3.0.1
jsonpointer                               2.3
jsonschema                                4.17.3
jsonschema-spec                           0.1.2
junit-xml                                 1.9
keyring                                   23.13.1
kombu                                     5.2.4
kubernetes                                23.6.0
lazy-object-proxy                         1.9.0
ldap3                                     2.9.1
linkify-it-py                             2.0.0
locket                                    1.0.0
lockfile                                  0.12.2
looker-sdk                                22.20.0
lxml                                      4.9.2
lz4                                       4.3.2
Mako                                      1.2.4
Markdown                                  3.4.1
markdown-it-py                            2.1.0
MarkupSafe                                2.1.2
marshmallow                               3.19.0
marshmallow-enum                          1.5.1
marshmallow-oneofschema                   3.0.1
marshmallow-sqlalchemy                    0.26.1
matplotlib-inline                         0.1.6
mccabe                                    0.7.0
mdit-py-plugins                           0.3.3
mdurl                                     0.1.2
mongomock                                 4.1.2
more-itertools                            8.14.0
moreorless                                0.4.0
moto                                      4.1.0
msal                                      1.20.0
msal-extensions                           1.0.0
msgpack                                   1.0.4
msrest                                    0.7.1
msrestazure                               0.6.4
multidict                                 6.0.4
mypy                                      0.971
mypy-boto3-appflow                        1.26.32
mypy-boto3-rds                            1.26.47
mypy-boto3-redshift-data                  1.26.30
mypy-extensions                           0.4.3
mysql-connector-python                    8.0.32
mysqlclient                               2.1.1
neo4j                                     5.4.0
networkx                                  2.8.8
nodeenv                                   1.7.0
ntlm-auth                                 1.5.0
numpy                                     1.22.4
oauthlib                                  3.2.2
openapi-schema-validator                  0.4.0
openapi-spec-validator                    0.5.2
packaging                                 21.3
pandas                                    1.5.2
pandas-gbq                                0.17.9
parameterized                             0.8.1
paramiko                                  2.12.0
parso                                     0.8.3
partd                                     1.3.0
pathable                                  0.4.3
pathspec                                  0.9.0
pbr                                       5.11.1
pendulum                                  2.1.2
pexpect                                   4.8.0
pickleshare                               0.7.5
pinotdb                                   0.4.12
pip                                       23.0
pipdeptree                                2.3.3
pkginfo                                   1.9.6
platformdirs                              2.6.2
pluggy                                    1.0.0
ply                                       3.11
portalocker                               2.6.0
pre-commit                                2.21.0
presto-python-client                      0.8.3
prison                                    0.2.1
prometheus-client                         0.15.0
prompt-toolkit                            3.0.36
proto-plus                                1.19.6
protobuf                                  3.20.0
psutil                                    5.9.4
psycopg2                                  2.9.5
psycopg2-binary                           2.9.5
ptyprocess                                0.7.0
pure-eval                                 0.2.2
pure-sasl                                 0.6.2
py                                        1.11.0
pyarrow                                   9.0.0
pyasn1                                    0.4.8
pyasn1-modules                            0.2.8
pycodestyle                               2.10.0
pycparser                                 2.21
pydantic                                  1.10.4
pydata-google-auth                        1.5.0
pydruid                                   0.6.5
pyenchant                                 3.2.2
pyexasol                                  0.25.1
pyflakes                                  3.0.1
PyGithub                                  1.57
Pygments                                  2.14.0
pygraphviz                                1.10
pyhcl                                     0.4.4
PyHive                                    0.6.5
PyJWT                                     2.6.0
pykerberos                                1.2.4
pymongo                                   3.13.0
pymssql                                   2.2.7
PyNaCl                                    1.5.0
pyOpenSSL                                 22.1.0
pyparsing                                 3.0.9
pypsrp                                    0.8.1
pyrsistent                                0.19.3
pyspnego                                  0.7.0
pytest                                    6.2.5
pytest-asyncio                            0.20.3
pytest-capture-warnings                   0.0.4
pytest-cov                                4.0.0
pytest-httpx                              0.21.2
pytest-instafail                          0.4.2
pytest-rerunfailures                      9.1.1
pytest-timeouts                           1.2.1
pytest-xdist                              3.1.0
python-arango                             7.5.5
python-daemon                             2.3.2
python-dateutil                           2.8.2
python-dotenv                             0.21.0
python-http-client                        3.3.7
python-jose                               3.3.0
python-ldap                               3.4.3
python-nvd3                               0.15.0
python-slugify                            7.0.0
pytz                                      2022.7.1
pytz-deprecation-shim                     0.1.0.post0
pytzdata                                  2020.1
pywinrm                                   0.4.3
PyYAML                                    6.0
qds-sdk                                   1.16.1
reactivex                                 4.0.4
readme-renderer                           37.3
redis                                     3.5.3
redshift-connector                        2.0.909
requests                                  2.28.2
requests-kerberos                         0.14.0
requests-mock                             1.10.0
requests-ntlm                             1.1.0
requests-oauthlib                         1.3.1
requests-toolbelt                         0.10.1
responses                                 0.22.0
rfc3986                                   1.5.0
rich                                      13.1.0
rich-click                                1.6.0
rsa                                       4.9
s3transfer                                0.6.0
sarif-om                                  1.0.4
sasl                                      0.3.1
scramp                                    1.4.4
SecretStorage                             3.3.3
semver                                    2.13.0
sendgrid                                  6.9.7
sentinels                                 1.0.0
setproctitle                              1.3.2
setuptools                                67.1.0
six                                       1.16.0
slack-sdk                                 3.19.5
smmap                                     5.0.0
snakebite-py3                             3.0.5
sniffio                                   1.3.0
snowballstemmer                           2.2.0
sortedcontainers                          2.4.0
soupsieve                                 2.3.2.post1
Sphinx                                    5.3.0
sphinx-airflow-theme                      0.0.11
sphinx-argparse                           0.4.0
sphinx-autoapi                            2.0.1
sphinx-copybutton                         0.5.1
sphinx-jinja                              2.0.2
sphinx-rtd-theme                          1.1.1
sphinxcontrib.applehelp                   1.0.3
sphinxcontrib-devhelp                     1.0.2
sphinxcontrib-htmlhelp                    2.0.0
sphinxcontrib-httpdomain                  1.8.1
sphinxcontrib-jsmath                      1.0.1
sphinxcontrib-qthelp                      1.0.3
sphinxcontrib-redoc                       1.6.0
sphinxcontrib-serializinghtml             1.1.5
sphinxcontrib-spelling                    7.7.0
SQLAlchemy                                1.4.46
sqlalchemy-bigquery                       1.5.0
sqlalchemy-drill                          1.1.2
SQLAlchemy-JSONField                      1.0.1.post0
sqlalchemy-redshift                       0.8.12
SQLAlchemy-Utils                          0.39.0
sqlparse                                  0.4.3
sshpubkeys                                3.3.1
sshtunnel                                 0.4.0
stack-data                                0.6.2
starkbank-ecdsa                           2.2.0
statsd                                    4.0.1
tabulate                                  0.9.0
tblib                                     1.7.0
tenacity                                  8.1.0
termcolor                                 2.2.0
text-unidecode                            1.3
thrift                                    0.16.0
thrift-sasl                               0.4.3
toml                                      0.10.2
tomli                                     2.0.1
toolz                                     0.12.0
tornado                                   6.1
towncrier                                 22.12.0
traitlets                                 5.8.1
trino                                     0.321.0
twine                                     4.0.2
types-boto                                2.49.18.5
types-certifi                             2021.10.8.3
types-croniter                            1.3.2.2
types-Deprecated                          1.2.9
types-docutils                            0.19.1.2
types-freezegun                           1.1.10
types-Markdown                            3.4.2.2
types-paramiko                            2.12.0.3
types-protobuf                            4.21.0.3
types-PyMySQL                             1.0.19.2
types-pyOpenSSL                           23.0.0.1
types-python-dateutil                     2.8.19.6
types-python-slugify                      7.0.0.1
types-pytz                                2022.7.1.0
types-PyYAML                              6.0.12.3
types-redis                               4.4.0.2
types-requests                            2.28.11.8
types-setuptools                          65.7.0.2
types-tabulate                            0.9.0.0
types-termcolor                           1.1.6
types-toml                                0.10.8.1
types-urllib3                             1.26.25.4
typing_extensions                         4.4.0
tzdata                                    2022.7
tzlocal                                   4.2
uamqp                                     1.6.3
uc-micro-py                               1.0.1
unicodecsv                                0.14.1
Unidecode                                 1.3.6
uritemplate                               3.0.1
urllib3                                   1.26.14
vertica-python                            1.2.0
vine                                      5.0.0
virtualenv                                20.17.1
volatile                                  2.1.0
watchtower                                2.0.1
wcwidth                                   0.2.6
webencodings                              0.5.1
websocket-client                          1.4.2
Werkzeug                                  2.2.2
wheel                                     0.38.4
wrapt                                     1.14.1
WTForms                                   3.0.1
xmltodict                                 0.13.0
yamllint                                  1.29.0
yarl                                      1.8.2
zict                                      2.2.0
zipp                                      3.11.0
zope.event                                4.6
zope.interface                            5.5.2

One more piece of information my PIP_INDEX_URL is the custom repo that we are downloading the package since we cannot ping the external internet. But our PIP_INDEX_URL has everything which is present in the external internet my PIP_INDEX_URL = 'https://artifactory.customrepo.org/artifactory/api/pypi/r-pypi-virtual/simple'

Should I need to specify this in the configMap as well?

The not-recommended way of installing python dependencies is working by executing the following commands

oc  exec -it <airflow-scheduler-pod>  /bin/bash
source /opt/bitnami/airflow/venv/bin/activate
pip install  --trusted-host artifactory.customrepo.org --index-url=https://artifactory.customrepo.org/artifactory/api/pypi/r-pypi-virtual/simple  pyhdb 
bitnami
mavenzer commented 1 year ago

I'm curious to know whether using the lifecycle hook is a good idea to install packages from the custom pip repo. I have tried it and it's working as of now.

I have applied the following template in the deployment file of Airflow Web, StatefulSet of Airflow worker and Deployment Config of Airflow Scheduler.

  containers:
        - name: airflow-web
          image: <custom-image-repo>/airflow:2.5.1
          imagePullPolicy: "IfNotPresent"
          lifecycle:
            postStart:
              exec:
                command: ["/bin/bash", "-c", "source /opt/bitnami/airflow/venv/bin/activate &&  pip install  --trusted-host artifactory.customrepo.org --index-url=https://artifactory.customrepo.org/artifactory/api/pypi/r-pypi-virtual/simple  pyhdb "]  

Genuinely wanted to know if it was a good idea or just a band-aid solution. What I have understood from the Openshift Documentation life-cycle hook can be a bit tricky as in some cases it may take some time to startup the container and make it ready.

Just a side note, I have deleted the deployment more than 20+ times and redeployed it again each time it worked for me. But it can be shear co-incidence as well.

aoterolorenzo commented 1 year ago

@mavenzer sorry for the late response.

We indeed use lifecycle hooks in many assets, but just to provide the values in order to customize and add them. So actually I think we don't set any of them by default, and probably the way to go is take a deeper look on the container logic and see if we can just provide this feature. Let me create an internal task for the team. We will reach you back here once done.

mavenzer commented 1 year ago

Thanks a lot for the insight. Really appreciate it.

mavenzer commented 5 months ago

@aoterolorenzo Do you have any updates/findings on the above topic?

gosro commented 1 week ago

Any update?