fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 24 forks source link

feature/databricks-sql-warehouse-compatibility #121

Closed fivetran-joemarkiewicz closed 5 months ago

fivetran-joemarkiewicz commented 5 months ago

PR Overview

This PR will address the following Issue/Feature: Issue #120

This PR will result in the following new package version: v1.7.1

This will not impact existing users who are not using Databricks SQL Warehouse runtimes. A SQL Warehouse runtime user will never have seen success. Therefore, this fix is not breaking and should ensure they may now see success.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Bug Fixes

  • Users leveraging the Databricks SQL Warehouse runtime were previously unable to run the fivetran_platform__audit_table model due to an incompatible incremental strategy. As such, the following updates have been made:
    • A new macro is_databricks_sql_warehouse() has been added to determine if a databricks runtime is a SQL Warehouse runtime for Databricks. This macro will return a boolean of true if the runtime is determined to be SQL Warehouse and false if it is any other runtime or destination.
    • The above macro is used in determining the incremental strategy within the fivetran_platform__audit_table. For Databricks SQL Warehouses, there will be no incremental strategy used. All other destination runtime strategies are not impacted with this change.
    • For the SQL Warehouse runtime, the best incremental strategy we could elect to use is the merge strategy. However, we do not have full confidence in the resulting data integrity of the output model when leveraging this strategy. Therefore, we opted for the model to replicate a full create or replace behavior for the time being.

Under the Hood

  • Added integration testing pipeline for Databricks SQL Warehouse.
  • Applied modifications to the integration testing pipeline to account for jobs being run on both Databricks All Purpose Cluster and SQL Warehouse runtimes.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

Before marking this PR as "ready for review" the following have been applied:

Detailed Validation

Please share any and all of your validation steps:

For validating these changes I wanted to ensure the following in relation to the fivetran_platform__audit_table model:

  1. No incremental strategy is being used for SQL Warehouse destinations
  2. The incremental strategy created for all other warehouses is unchanged

See the validations below:

If you had to summarize this PR in an emoji, which would it be?

🧱
fivetran-joemarkiewicz commented 5 months ago

Thanks @fivetran-catfritz! I agree with your suggestion and committed it to the branch. I also really appreciate your README updates 🙏. I made one small wording change to be consistent with the terminology of the runtime name (Datarbricks All Purpose Cluster). Lastly, I regenerated the docs as the incremental file format change needed to be picked up in the docs.

Let me know if there are any other comments needed before approving.