Open aditya-7 opened 8 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
I am running into an issue with the openlineage plugin too, however it is occuring when I try to install the Datahub plugin (acryl-datahub-airflow-plugin[plugin-v2]
) to extract lineage as described here.
I am using Airflow 2.8.4 directly from the quickstart docker-compose.yml file. Didn't have this issue in Airflow 2.5.x-2.7.x.
WARN[0000] The "AIRFLOW_UID" variable is not set. Defaulting to a blank string.
WARN[0000] The "AIRFLOW_UID" variable is not set. Defaulting to a blank string.
airflow-scheduler-1 |
airflow-scheduler-1 | BACKEND=redis
airflow-scheduler-1 | DB_HOST=redis
airflow-scheduler-1 | DB_PORT=6379
airflow-scheduler-1 |
airflow-scheduler-1 | ____________ _____________
airflow-scheduler-1 | ____ |__( )_________ __/__ /________ __
airflow-scheduler-1 | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
airflow-scheduler-1 | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
airflow-scheduler-1 | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
airflow-scheduler-1 | [2024-04-06T21:15:29.903+0000] {plugins_manager.py:247} ERROR - Failed to import plugin openlineage
airflow-scheduler-1 | Traceback (most recent call last):
airflow-scheduler-1 | File "/home/airflow/.local/lib/python3.10/site-packages/airflow/plugins_manager.py", line 239, in load_entrypoint_plugins
airflow-scheduler-1 | plugin_class = entry_point.load()
airflow-scheduler-1 | File "/home/airflow/.local/lib/python3.10/site-packages/importlib_metadata/__init__.py", line 211, in load
airflow-scheduler-1 | module = import_module(match.group('module'))
airflow-scheduler-1 | File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
airflow-scheduler-1 | return _bootstrap._gcd_import(name[level:], package, level)
airflow-scheduler-1 | File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
airflow-scheduler-1 | File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
airflow-scheduler-1 | File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
airflow-scheduler-1 | File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
airflow-scheduler-1 | File "<frozen importlib._bootstrap_external>", line 883, in exec_module
airflow-scheduler-1 | File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
airflow-scheduler-1 | File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/openlineage/plugins/openlineage.py", line 23, in <module>
airflow-scheduler-1 | from airflow.providers.openlineage.plugins.listener import get_openlineage_listener
airflow-scheduler-1 | File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/openlineage/plugins/listener.py", line 28, in <module>
airflow-scheduler-1 | from airflow.providers.openlineage.plugins.adapter import OpenLineageAdapter, RunState
airflow-scheduler-1 | File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/openlineage/plugins/adapter.py", line 26, in <module>
airflow-scheduler-1 | from openlineage.client.facet import (
airflow-scheduler-1 | ImportError: cannot import name 'JobTypeJobFacet' from 'openlineage.client.facet' (/home/airflow/.local/lib/python3.10/site-packages/openlineage/client/facet.py)
Apache Airflow Provider(s)
openlineage
Versions of Apache Airflow Providers
Apache Airflow version
2.8.2
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Docker-Compose
Deployment details
Docker Compose version v2.24.3-desktop.
Created a custom docker image using
Dockerfile
:Changed
x-airflow-common.&airflow-common
in thedocker-compose.yml
file:Built & deployed using the command:
docker-compose build && docker-compose up
This is my project structure:
What happened
While I deploy Airflow, the airflow-scheduler, and the airflow-triggerer containers fail to load the openlineage plugin. They can load inbuilt extractors such as BashExtractor, PythonExtractor, etc. Interestingly, the airflow-init container was able to load the plugin successfully. I was able to test this by overriding the library file
/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/openlineage/extractors/manager.py
with a few debug points using the logger. I overwrote the ExtractorManager constructor to add some debug points like this:The airflow-triggerer and the airflow-scheduler containers failed to load the openlineage plugin while trying to import the custom extractor class with the following error:
Whereas, the airflow-init container successfully loaded the plugin with the same custom extractor:
What you think should happen instead
The Airflow triggerer and the scheduler should also be able to import the Custom extractor class like the Airflow init container did, and successfully load the openlineage plugin.
How to reproduce
<project_root>/plugins/extractors/some_ilneage_extractor.py
class MyExtractor(BaseExtractor):