Open Eric-Xu opened 2 years ago
Hi @taufiqibrahim! Would you mind taking a look at this one?
Hi @maggiehays , okay let me take a look on this one.
Hi @Eric-Xu @maggiehays
I've done some checks with Presto, and learn some concepts about Presto Catalog.
Basically, a catalog references a data source via a connector as defined here. So, it's just basically a name.
For Hive catalog/connector, the common default connector name is hive
and stored in /etc/catalog/hive.properties
configuration file inside Presto server. It's also allow us to have multiple Hive clusters as explained here.
So, if we have Hive connector/catalog using name other than hive
, for example sales
like describe on Presto doc above, it will be impossible to guess the platform correctly.
Any ideas about that?
Hi @taufiqibrahim! We recently saw this with our Metabase connector as well - take a look at how it was resolved in this PR
Hi @Eric-Xu & @taufiqibrahim - let me know if this is a fix you're able to push!
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
Describe the bug When using the Redash Connector (Datahub v0.8.21) with
parse_table_names_from_sql: true
, the platform value used to construct the upstream dataset URN does not take into account the data catalog type (e.g. Hive) when a dashboard or chart is pulling from a Presto data source.To Reproduce Steps to reproduce the behavior:
parse_table_names_from_sql: true
. The Redash cluster should be configured where Presto is the query engine and Hive is the underlying data catalog.Expected behavior When ingesting dashboards and charts from Redash with
parse_table_names_from_sql: true
, if the data source type isPresto
, then check what is the underlying data catalog (e.g. Hive) to be used as the platform value when constructing the upstream dataset URN.Sample JSON response from Redash's
/api/data_sources/
API endpoint being called here.In the example above, the upstream dataset platform should be set to
hive
and notpresto
.