Open mavenzer opened 1 year ago
I'm curious to know whether using the lifecycle
hook is a good idea to install packages from the custom pip repo. I have tried it and it's working as of now.
I have applied the following template in the deployment file of Airflow Web, StatefulSet of Airflow worker and Deployment Config of Airflow Scheduler.
containers:
- name: airflow-web
image: <custom-image-repo>/airflow:2.5.1
imagePullPolicy: "IfNotPresent"
lifecycle:
postStart:
exec:
command: ["/bin/bash", "-c", "source /opt/bitnami/airflow/venv/bin/activate && pip install --trusted-host artifactory.customrepo.org --index-url=https://artifactory.customrepo.org/artifactory/api/pypi/r-pypi-virtual/simple pyhdb "]
Genuinely wanted to know if it was a good idea or just a band-aid solution. What I have understood from the Openshift Documentation life-cycle hook can be a bit tricky as in some cases it may take some time to startup the container and make it ready.
Just a side note, I have deleted the deployment more than 20+ times and redeployed it again each time it worked for me. But it can be shear co-incidence as well.
@mavenzer sorry for the late response.
We indeed use lifecycle hooks in many assets, but just to provide the values in order to customize and add them. So actually I think we don't set any of them by default, and probably the way to go is take a deeper look on the container logic and see if we can just provide this feature. Let me create an internal task for the team. We will reach you back here once done.
Thanks a lot for the insight. Really appreciate it.
@aoterolorenzo Do you have any updates/findings on the above topic?
Any update?
Hi everyone!
Could you try using the values below?
initContainers:
- name: pip-install
image: "{{ include \"airflow.image\" . }}"
imagePullPolicy: "{{ .Values.image.pullPolicy }}"
command:
- /bin/bash
args:
- -ec
- |
. /opt/bitnami/airflow/venv/bin/activate
pip install --no-cache-dir -r requirements.txt
cp -r /opt/bitnami/airflow/venv/* /venv
workingDir: /opt/bitnami/airflow
volumeMounts:
- name: empty-dir
mountPath: /venv
subPath: app-venv-dir
- name: requirements
mountPath: /opt/bitnami/airflow/requirements.txt
subPath: requirements.txt
extraVolumeMounts:
- name: empty-dir
mountPath: /opt/bitnami/airflow/venv
subPath: app-venv-dir
extraVolumes:
- name: requirements
configMap:
name: airflow-requirements
extraDeploy:
- apiVersion: v1
kind: ConfigMap
metadata:
name: airflow-requirements
namespace: "{{ include \"common.names.namespace\" . }}"
data:
requirements.txt: |-
pyhdb==0.3.4
Name and Version
bitnami/airflow 2.5.1
What architecture are you using?
None
What steps will reproduce the bug?
Hi all, I have deployed Airflow 2.5.1 using helm charts of Bitnami in Openshift. We are using the standard chart and customizing with fsGroup in the charts as there are certain restrictions to the Openshift uid, gids.
We wanted to add additional python packages in the current deployment such as pydhb.
I have seen a similar issue: https://github.com/bitnami/charts/issues/9390
And how to test that the package is added in the current Deployment? Since the package will be added to web, scheduler, worker node.
I'm unable to find the custom package there!
Are you using any custom parameters or values?
Helm command which I'm using for deploying it
values.yaml that I'm using for the adding additional Python dependencies
What do you see instead?
I'm unable to see the package installed when I'm using the following command:
Additional information
package which is present in the /opt/bitnami/airflow/venv/bin/activate
One more piece of information my PIP_INDEX_URL is the custom repo that we are downloading the package since we cannot ping the external internet. But our PIP_INDEX_URL has everything which is present in the external internet my
PIP_INDEX_URL = 'https://artifactory.customrepo.org/artifactory/api/pypi/r-pypi-virtual/simple'
Should I need to specify this in the configMap as well?
The not-recommended way of installing python dependencies is working by executing the following commands