datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.42k stars 2.79k forks source link

After connecting airflow, the platform of trino operator's in/outlet is changed to presto. #10558

Open shepherd44 opened 1 month ago

shepherd44 commented 1 month ago

Describe the bug Integrate using the airflow plugin v2 and then, TrinoOperator to run a query and see that the task flow and dataset are opened. However, the dataset was linked to presto instead of trino, so it was being captured as a different dataset than the trino dataset I had linked to.

OL_SCHEME_TWEAKS seems to change trino to presto, but why is it like this? I am having a problem with dataset separation when writing trino ingestion together. Is it not possible to make the dataset platform of airflow Trino Operator come out as trino in order to use trino ingestion together?

To Reproduce Steps to reproduce the behavior:

  1. airflow plugin v2 integration
  2. use TrinoOperator
  3. inlet/outlet dataset comes out as presto

Expected behavior I want the platform of the TrinoOperator to come out as trino.

hsheth2 commented 1 month ago

I'm not sure why that mapping is on OL_SCHEME_TWEAKS. Presto was the old name of trino, but we use trino throughout our codebase.

Would you be up for creating a PR to remove that from the mapping?

shepherd44 commented 1 month ago

Yes. Let's modify it and create a PR.

github-actions[bot] commented 4 days ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io