datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.91k stars 2.94k forks source link

Support for Databricks Catalog with Hyphens #10372

Open thiagortz opened 6 months ago

thiagortz commented 6 months ago

Describe the bug Hello, while testing the integration with Databricks locally, I noticed an error related to backticks in the logs of the datahub-actions service container. In my case, the catalogs have hyphens separating the names, for example: production-metadata. I believe the same error might occur for schemas and tables with hyphens.

Screenshots

[2024-04-24 18:42:27,204] INFO     {great_expectations.data_context.data_context.abstract_data_context:4495} - Usage statistics is disabled; skipping initialization.
[2024-04-24 18:42:27,321] ERROR    {[datahub.ingestion.source.ge](http://datahub.ingestion.source.ge/)_data_profiler:1168} - Encountered exception while profiling production-metadata.mytest.control_manager
Traceback (most recent call last):
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/databricks/sql/client.py", line 503, in execute
    execute_response = self.thrift_backend.execute_command(
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 854, in execute_command
    return self._handle_execute_response(resp, cursor)
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 947, in _handle_execute_response
    final_operation_state = self._wait_until_command_done(
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 777, in _wait_until_command_done
    self._check_command_not_in_error_or_closed_state(
  File "/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 579, in _check_command_not_in_error_or_closed_state
    raise ServerOperationError(
databricks.sql.exc.ServerOperationError:
[INVALID_IDENTIFIER] The identifier production-metadata is invalid. Please, consider quoting it with back-quotes as `production-metadata`. SQLSTATE: 42602 (line 3, pos 12)
== SQL ==
WITH gnjzchibbzrnumpx AS
(SELECT count(*) AS count_1
FROM production-metadata.mytest.control_manager)
------------^^^
 SELECT gnjzchibbzrnumpx.count_1
FROM gnjzchibbzrnumpx

Desktop (please complete the following information):

thiagortz commented 6 months ago

The bug was also reported in the Slack channel https://datahubspace.slack.com/archives/C029A3M079U/p1713985062423189

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io