Closed DonghaLim closed 3 weeks ago
Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!
What airflow operators are you using to run your task? The datasets, if available, would be in either the inputs
or outputs
properties of the OpenLineage event... I would ping the OpenLineage slack community. @mobuchowski is the leading expert on the Airflow/OpenLineage integration side.
We are you using some kinds of Operators like below.
How can I add inputs or outputs ? As I checked in document, It seems that we can add "inlets or outlets" Additionally, we are using "apache-airflow-providers-openlineage" for 1.2.0.
When I checked with supported operators in document, PythonOperator and BashOperator is supported from 1.4.0 Although I upgraded apache-airflow-providers-openlineage" for 1.4.0, it didn't work now.
[2024-10-28, 08:49:55 UTC] {configuration.py:1050} WARNING - section/key [openlineage/disabled_for_operators] not found in config [2024-10-28, 08:49:55 UTC] {manager.py:105} WARNING - Failed to extract metadata using found extractor <airflow.providers.openlineage.extractors.bash.BashExtractor object at 0x7f19c454f550> - section/key [openlineage/disabled_for_operators] not found in config task_type=BashOperator airflow_dag_id=biz_energy_service_daily_0.0.1 task_id=clean_output_path_conformed_daily_dim_energy_service_installed_apps_sparse airflow_run_id=scheduled__2024-10-24T00:00:00+00:00 [2024-10-28, 08:49:55 UTC] {configuration.py:1050} WARNING - section/key [openlineage/config_path] not found in config [2024-10-28, 08:49:55 UTC] {utils.py:408} WARNING - section/key [openlineage/config_path] not found in config
[2024-10-28, 08:51:48 UTC] {base.py:152} WARNING - OpenLineage provider method failed to extract data from provider. [2024-10-28, 08:51:48 UTC] {configuration.py:1050} WARNING - section/key [openlineage/config_path] not found in config [2024-10-28, 08:51:48 UTC] {utils.py:408} WARNING - section/key [openlineage/config_path] not found in config
@DonghaLim This page https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/supported_classes.html shows the supported Operators.
Since you're using apache-airflow-providers-openlineage
, for the future valid repo to open issues would be https://github.com/apache/airflow/ - Marquez visualizes data based on events it receives, and can't deal with just the lack the data in the events.
From 2.10 we added a feature where the lineage can be gathered from Airflow Hooks, even if you use operators that are not supported directly. This is a feature we'll develop more, and more hooks will be supported over time. This won't work on 2.7.3, so you'll not get dataset data from PythonOperator on that version.
Additionally, I'd recommend to use latest released version that's compatible with your Airflow version - at least 1.7.0 version of OpenLineage provider fixes the warning logs you've posted recently.
If you have any questions feel free to ask them on OpenLineage slack, OpenLineage issues or Airflow issues/discussions - I'll close this issue as it's not the relevant place to talk about Airflow integration.
Hello, This is dongha. I have a question about marquez connection with airflow. I use airflow 2.7.3 and marquez is executed in EKS system. marquez web is working well and it is showed dag list that airflow is run.
By the way, I can't see any datasets and only one job is showed. Is there anything that I need to check?
Please let me know
thanks.
I can't upload any photos but I attached event payload in events menu
{ "eventType":string"COMPLETE" "eventTime":string"2024-10-25T09:05:44.803117Z" "run":{2 items "runId":string"f3cc61c0-7c19-36d0-9e22-3a2864..." "facets":{2 items "nominalTime":null "parent":null } } "job":{3 items "namespace":string"airflow" "name":string"biz_daily" "facets":{4 items "documentation":null "sourceCodeLocation":null "sql":null "jobType":null } } "inputs":[]0 items "outputs":[]0 items "producer":string"https://github.com/apache/airf..." "schemaURL":string"https://openlineage.io/spec/1-..." }