fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 13 forks source link

bugfix.status-id-matching #95

Closed fivetran-joemarkiewicz closed 1 year ago

fivetran-joemarkiewicz commented 1 year ago

PR Overview

This PR will address the following Issue/Feature: #92

This PR will result in the following new package version: v0.13.0

This will require a full refresh following the upgrade. As such, this should be a breaking change.

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

At the core of this change we are including the status_id field to the int_jira__field_history_scd and jira__daily_issue_field_history models. This is required because as identified in the linked Issue on an incremental run we are currently trying to join the statuses using status_id; however, we adjusted our package back in v0.9.0 to produce the status name instead of the id. Therefore, this join works only on the non incremental run as the data in the joined cte is providing the status_id. Once an initial run is complete however the status field is now producing the name and therefore will never have a successful join.

I originally attempted to simply adjust the join logic to join on status_id for non incremental runs, and then join on status_name for incremental runs. However, although Jira mentions a status will never have the same name, that does not seem to be the case in the Jira synced data via our Fivetran connector. There in fact are possible duplicates. I wonder if this has to deal with deleted status' that we do not capture. Nevertheless, if we tried to do this join on status_name it would result in a fan out as it is a many to many relationship.

Therefore, in order to address this situation while still producing the status_name in the final model, I decided to bring in the status_id as a default (and permanent) field. This way we are able to confidentially join on the status_id in downstream models while also producing the accurate status_name without risk of a fan out.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

I also ran with the daily_field_history variable set to join a few other fields just to ensure nothing was breaking with that logic.

Before marking this PR as “ready for review” the following have been applied:

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as “ready for review”:

To validate this change I wanted to particularly test the incremental strategy and it's affects when running on new and stale data as well as recreate the identified issue.

Recreate the issue

New Data Tests

Stale Data Tests

Standard Updates

Please acknowledge that your PR contains the following standard updates:

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

I believe we should hold off on the docs since this will be batched together with other changes in the next release.

If you had to summarize this PR in an emoji, which would it be?

🧍‍♂️