For SQL based operators there is airflow.providers.openlineage.utils.sql module used by SQLParser interface class.
In short: it allows to parse table schemas based on input and output dataset parsed from SQL query.
What you think should happen instead
It should take into consideration if there is database/schema from connection setup detected from information schema query result. If there is one found it should stop adding other tables.
How to reproduce
Corner case is following:
use database connection with database and/or schema default set
refer to table name only in SQL query (e.g. SELECT * FROM my_table instead of SELECT * FROM my_schema.my_table)
if there's the same table name in other database/schema (or database+schema combination, it depends on database) OL integration will produce two datasets for tables.
For instance if one uses postgres with search path set to public schema SELECT * FROM my_table would get data from public.my_table even if there is another table with the same name but different schema. OL integration will take both my_schema.my_table and public.my_table.
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Apache Airflow version
main (development)
What happened
For SQL based operators there is
airflow.providers.openlineage.utils.sql
module used bySQLParser
interface class. In short: it allows to parse table schemas based on input and output dataset parsed from SQL query.What you think should happen instead
It should take into consideration if there is database/schema from connection setup detected from information schema query result. If there is one found it should stop adding other tables.
How to reproduce
Corner case is following:
SELECT * FROM my_table
instead ofSELECT * FROM my_schema.my_table
)public
schemaSELECT * FROM my_table
would get data frompublic.my_table
even if there is another table with the same name but different schema. OL integration will take bothmy_schema.my_table
andpublic.my_table
.Operating System
macOS
Versions of Apache Airflow Providers
apache-airflow-providers-openlineage==1.2.0
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct