Closed zaistev closed 3 years ago
Hi,
Are your dags located in a public repository that we could use to reproduce the issue?
hi @javsalgar thanks for the reply. I published this repo with same structure https://github.com/zaistev/example-dags/tree/master/src/dags
dags:
##
enabled: true
repositories:
- repository: https://github.com/zaistev/example-dags.git
branch: master
**name: dags**
path: "src/dags"
Broken DAG: [/opt/bitnami/airflow/dags/git-dags/data_pipeline_dag.py] Traceback (most recent call last):
File "
unfortunately, it still throws an error. if I try to do what #2305 says in terms of module naming logic, then i should use the "git-dags" name.
so I did the following dags = importlib.import_module("git-dags")
then to import the right func, my change led to. dags.plugins.tasks.fetch_historical_to_s3
which also throws an error
Broken DAG: [/opt/bitnami/airflow/dags/git-dags/data_pipeline_dag.py] Traceback (most recent call last):
File "
my assumptions are that cant be folder structure, otherwise wouldnt read at all. same for naming conventions or executor, I tried celery and k8s and got same error. perhaps definition of pythonpath.
there is a workaround to this problem in the dag file, just manually import the path, ugly af but works.
import sys sys.path sys.path.append('/opt/bitnami/airflow/git-<git.plugins.repositories[0].name>')
Hi,
Wow, this is strange. So, if I understand correctly, if in the chart you specify the extraEnvVars
values (there is one for each component) with PYTHONPATH (and adding the proper dirs), then it should work, right? Just to confirm that this is enough and with that you don't have to use the sys.path.append function.
@javsalgar in my dag file "data_pipeline_dag.py" I added these lines:
import sys sys.path sys.path.append('/opt/bitnami/airflow/git-dataflow-plugins')
"git-dataflow-plugins" is a reference to the name definition in the yml file for git.plugins.repositories[0].name
plugins:
##
enabled: true
repositories:
- repository: ...
branch: ...
name: **dataflow-plugins**
path: "src/plugins"
from your comments, instead of doing the manual import and appending the dir. is it possible to do it in the extraEnvVars? shouldnt be automatic?
Cheers
Hi,
I'm not sure if the structure of this DAG is different (in our tests we import a DAG have no issues) and that's why it requires extra changes like the one you were doing. The first to do is to confirm that this works
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow/git-dataflow-plugins"
I have a similar error for a plugin ModuleNotFoundError: No module named 'PLUGIN_NAME'
,
the PLUGIN_NAME.py
is inside REPO_NAME/airflow/plugins
in our repository.
After describing and sshing into the airflow webserver pod I think the reason is that the plugins folder is wrongly mounted:
kubectl describe pod airflow-web-YADATADA -n airflow
gives me under volume mounts:
/opt/bitnami/airflow/dags/REPO_NAME from git-cloned-dag-files-REPO_NAME (rw,path="airflow/dags")
/opt/bitnami/airflow/REPO_NAME from git-cloned-plugins-files-REPO_NAME (rw,path="airflow/plugins")
at the same time airflow config is:dags_folder: /opt/bitnami/airflow/dags
(all good)plugins_folder:/opt/bitnami/airflow/plugins
does not exist! see aboveSo I will circumvent the issue this by adjusting AIRFLOW__CORE__PLUGINS_FOLDER
but that is obviously only fighting a symptom. But the proper fix should be easy to do. Looking forward to hear an update :)
Setting the PYTHONPATH in the extraEnvVars won't work. I found that the issue is mitigated when you SSH into the deployment and run airflow db upgrade
. Since airflow db upgrade
or airflow upgradedb
is idempotent, can we somehow run the command everytime the git is pulled using git-sync? Not sure if the problem underlies with git-sync not being part of initContainer and the plugins folder is being initialized before the git-sync first runs? Just throwing bunch of ideas.
Hi,
There's something I don't understand about it. I see that there is a first git clone so the git sync should not be related, right? Not sure If I'm missing anything. By the way, @miguelaeh is working on transitioning the airflow containers to bash, so I will ping him so he can check if the new version of the container fixes the issue.
Thank you. Running airflow db upgrade
was not appropriate as a long term solution as we may be doing multiple helm upgrades or docker image update and the plugins would fall off everytime. I fixed it by including git-sync within the initContainer of the deployments.
Hi. I am facing similar issues. @spidman Can you please share how did you go about resolving this.
Hi,
In our environment we have dags and plugins from git enabled to make it work we'd to add the lines in PYTHONPATH extra environment variables like below:
dags:
enabled: true
repositories:
- repository: https://<git-repo>
branch: master
name: dags
path: airflow/dags
plugins:
enabled: true
repositories:
- repository: https://<git-repo>
branch: master
name: plugins
path: airflow/plugins
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow/dags/git-dags:/opt/bitnami/airflow/git-plugins"
- name: PYSPARK_PYTHON
value: "/opt/bitnami/airflow/venv/bin/python"
Hi,
I think we should consider setting the PYTHONPATH variable by default. Any thoughts on this?
@javsalgar please do; as you need do it manually for dags & plugins; or just add it into documentation that is needed
Ok, I will forward this to the engineering team
same problem, even with PYTHONPATH set as extraEnvVars
Could there be another missing path to load? Which is the exact error?
Experiencing this bug as well. The path is missing the plugins
segment, so Airflow can't find the plugins that are being sync'ed.
Should be:
/opt/bitnami/airflow/plugins/git_{deployname}
But Bitnami is putting it here (one directory too shallow):
/opt/bitnami/airflow/git_{deployname}
For the git DAG sync, the path is correct:
/opt/bitnami/airflow/dags/git_{deployname}
I tried creating a symlink to fix, but Airflow did not like that at all (didn't follow the symlink once loaded).
This next thing didn't work for me - since the path is still incorrect:
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow/dags/git-dags:/opt/bitnami/airflow/git-plugins"
perhaps this would work?
extraEnvVars:
- name: PYTHONPATH
value: "/opt/bitnami/airflow/dags/git-{deployname}:/opt/bitnami/airflow/git-{deployname}"
I just created a custom image for now with the plugin I needed to proceed. It would be nice to have a fix to enable git sync again.
@javsalgar I've located the bug in the git template file _git_helpers.tpl
. Please forward to engineering team - should be quick fix.
Should be:
mountPath: /opt/bitnami/airflow/plugins/git_{{ include "airflow.git.repository.name" . }}
Thanks for letting us know! As you spotted the issue, would you like to contribute with a PR?
Which chart: chart: airflow-8.0.3
app version 2.0.1
Describe the bug I am currently using git-sync to fetch dags and plugins, It does fetch the code from a private git repository. However when I try to import a module from nested module it doesnt work. Ive checked this issue 2305, so I added init.py file to each sub folder to be recognized as python module, and changed the import structure to src.dags.plugins. it didnt work either.
it still throws an error. Is the current folder structure related to this issue? I am trying to figure out how to import from plugins (other git-repo) and from nested modules. I am missing sth in Dockerfile (currently empty).
Error
Broken DAG: [/opt/bitnami/airflow/dags/git-dataflow/data_pipeline_dag.py] Traceback (most recent call last): File "", line 219, in _call_with_frames_removed
File "/opt/bitnami/airflow/dags/git-dataflow/data_pipeline_dag.py", line 9, in
import dags.plugins as plugins
ModuleNotFoundError: No module named 'dags'
values.yml
dags:
Enable in order to download DAG files from git repositories.
enabled: true repositories:
my current folder structure . ├── Dockerfile ├── README.md ├── init.py ├── requirements.txt ├── src │ ├── init.py │ └── dags │ ├── init.py │ ├── data_pipeline_dag.py │ └── plugins │ ├── init.py │ ├── operators │ │ └── init.py │ ├── tasks.py │ └── utils.py └── tests `
Expected behavior
Version of Helm and Kubernetes:
helm version
:Additional context using KubernetesExecutor