fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 13 forks source link

Feature/performance enhancement #127

Closed fivetran-catfritz closed 2 months ago

fivetran-catfritz commented 3 months ago

PR Overview

This PR will address the following Issue/Feature:

This PR will result in the following new package version:

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

🚨 Breaking Changes 🚨

⚠️ Since the following changes are breaking, a --full-refresh after upgrading will be required.

  • To reduce storage, updated default materialization of the staging models to views.

Performance improvements

  • Updated the incremental strategy of the following models to insert_overwrite for BigQuery and Databricks All Purpose Cluster destinations and delete+insert for all other supported destinations.
    • int_jira__issue_calendar_spine
    • int_jira__pivot_daily_field_history
    • jira__daily_issue_field_history

      At this time, models for Databricks SQL Warehouse destinations are materialized as tables without support for incremental runs.

  • Removed intermediate models int_jira__agg_multiselect_history, int_jira__combine_field_histories, and int_jira__daily_field_history by combining them with int_jira__pivot_daily_field_history. This is to reduce the redundancy of the data stored in tables, the number of full scans, and the volume of write operations.
  • Updated the default materialization of int_jira__issue_type_parents from a table to a view. This model is used in one downstream model, so a view will reduce storage requirements while not significantly hindering performance.
  • For Snowflake and BigQuery destinations, added cluster_by columns to the configs for incremental models.
  • For Databricks All Purpose Cluster destinations, updated incremental model file formats to parquet for compatibility with the insert_overwrite strategy.

Features

  • Added a default 3-day look-back to incremental models to accommodate late arriving records. The number of days can be changed by setting the var lookback_window in your dbt_project.yml. See the Lookback Window section of the README for more details.

Under the Hood:

  • Added integration testing pipeline for Databricks SQL Warehouse.
  • Updated the maintainer pull request template.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

Before marking this PR as "ready for review" the following have been applied:

Detailed Validation

Please share any and all of your validation steps:

If you had to summarize this PR in an emoji, which would it be?

:dancer:
fivetran-catfritz commented 3 months ago

Will regen docs after final approval