Open Steiniche opened 1 year ago
+1 - we are experiencing the same behavior
workaround on this I found was changing it from managed table to s3 in the destination settings. Also found that change makes it significantly faster at syncing data. I am on version 1.0.1 of the destination though so might have changed but had the same issue when using managed tables
At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.
Issue still persist even though a workaround was found.
Connector Name
destination-databricks-lakehouse
Connector Version
1.1.0
What step the error happened?
During the sync
Revelant information
Context
Using Airbyte as a self service platform for getting data into Databricks. The experience is awesome (thanks to all contributors!) but we are experiencing a bug.
I have tried with both a file (https) and some MSSQL databases / tables. We are using Databricks with Unity. Data Source is set to Managed Tables and connecting through a SQL Warehouse running in the Databricks workspace.
Expected
We expect that the data pulled from a source e.g. MSSQL or a file would be loaded into Databricks as a schema with columns as defined by the data.
Actual What happens is that there are created a schema (_airbyte_raw_events) with 3 columns (_airbyte_ab_id, _airbyte_data, _airbyte_emitted_at). The airbyte_data column contains the data from the source as a json string. As per my understanding these are the staging tables airbyte uses to move the data and then produce the final result.
Closing
If anyone have any ideas as to why we are seeing this or if this is in fact a bug we will appreciate it. What would be required to get schema + columns instead of just json dumps (staging data)?
I have not found other issues describing this but please link relevant issues.
We will happily contribute a fix if this is in fact a bug and we can get some pointers as to where to change the code.
If there is a need for more information about the setup, which options are enabled, logs, etc. please state which and I will gather them.
Relevant log output
Contribute