airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.88k stars 4.07k forks source link

[Destination Databricks] Table Creation is case-insensitive, table detection is case sensitive #44072

Closed evantahler closed 1 month ago

evantahler commented 2 months ago

Consider these logs from a sync:

2024-08-13 02:34:43 destination > INFO sync-operations-1 i.a.i.b.d.o.AbstractStreamOperation(prepareFinalTable):181 Final table does not exist for stream FORM_SUBMISSIONS_CUSTOMER_FEEDBACK_SURVEY, creating.
2024-08-13 02:34:43 destination > INFO sync-operations-1 i.a.i.d.d.j.DatabricksDestinationHandler(execute):52 Executing sql 22bdaae1-cdf9-4535-9d08-a6556b8fe5b5-16af4318-a346-4941-b56d-2065b9624922: [            CREATE  TABLE xxx.`hubspot`.`FORM_SUBMISSIONS_CUSTOMER_FEEDBACK_SURVEY`(
                `_airbyte_raw_id` STRING, 
`_airbyte_extracted_at` TIMESTAMP, 
`_airbyte_meta` STRING, 
`_airbyte_generation_id` LONG, 
`values` STRING, 
`pageUrl` STRING, 
`submittedAt` DECIMAL(38, 10)
            );]
2024-08-13 02:34:45 destination > ERROR sync-operations-1 i.a.i.d.d.j.DatabricksDestinationHandler(execute):63 Sql 22bdaae1-cdf9-4535-9d08-a6556b8fe5b5-16af4318-a346-4941-b56d-2065b9624922 failed in 2079 ms java.sql.SQLException: [Databricks][JDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: 42P07, Query: CREATE  TA***, Error message from Server: org.apache.hive.service.cli.HiveSQLException: Error running query: [TABLE_OR_VIEW_ALREADY_EXISTS] org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `hubspot`.`FORM_SUBMISSIONS_CUSTOMER_FEEDBACK_SURVEY` because it already exists.
Choose a different name, drop the existing object, add the IF NOT EXISTS clause to tolerate pre-existing objects, add the OR REPLACE clause to replace the existing materialized view, or add the OR REFRESH clause to refresh the existing streaming table. SQLSTATE: 42P07

Note the Final table does not exist for stream FORM_SUBMISSIONS_CUSTOMER_FEEDBACK_SURVEY, creating. before the error.

Reproduction Steps

CREATE SCHEMA `integration_tests`.`evan`;

CREATE TABLE `integration_tests`.`evan`.`TEST`(
  `id` LONG, 
  `data` STRING
);

SHOW TABLES IN `integration_tests`.`evan`; -- `integration_tests`.`evan`.`test` (lower case) is found, not upper case
evantahler commented 2 months ago

Confirmed - names of tables are always lower case in databricks w/ unity catalog - https://docs.databricks.com/en/sql/language-manual/sql-ref-names.html

evantahler commented 1 month ago

Note: Check column name cases too

edgao commented 1 month ago

fixed in https://github.com/airbytehq/airbyte/pull/44506. (for the record: column names preserve case; table + schema names are downcased)