datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.93k stars 2.94k forks source link

fix(ingestion/airflow-plugin): fix AthenaOperator extraction #11857

Open steffengr opened 1 week ago

steffengr commented 1 week ago

The GenericSqlExtractor which is currently by the DataHub Airflow plugin to extract lineage information does not properly support the AthenaOperator and crashes with "AttributeError: 'AthenaOperator' object has no attribute 'sql'". This patch introduces a AthenaOperatorExtractor following the BigQueryInsertJobOperatorExtractor example to fix support for the AthenaOperator.

Fixes #11160

Checklist

steffengr commented 2 days ago

@hsheth2 Thank for looking at this! I don't see related tests in this module. Could you point me at an example? I don't have much experience with this code base.

hsheth2 commented 1 day ago

@steffengr we have an integration test (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/tests/integration/test_plugin.py) that runs a number of DAGs (e.g. https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/tests/integration/dags/snowflake_operator.py)

That'd probably be the easiest way to do it. Otherwise, we could also do a more targeted unit test in https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/tests/unit/test_airflow.py