fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 13 forks source link

Bug: Making sure components data is properly joined onto JIRA daily issue field history and downstream models #99

Closed fivetran-avinash closed 1 year ago

fivetran-avinash commented 1 year ago

PR Overview

This PR will address the following Issue/Feature: #83

This PR will result in the following new package version: v0.14.0

This is a breaking change as a dbt run --full-refresh is required to reflect the proper components data for customers.

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

The main issue customers are facing is components data was being joined in incorrectly because there would be identifiers in field_option that would have the same id as a component.

So when the join below is executed within jira__daily_issue_field_history, it returns the field option values instead of the component names for the component fields.

Screenshot 2023-05-10 at 1 29 01 PM

This PR addresses this issue by ensuring we join on components separately from the field option model. The customer must be utilizing the components source model and specifying components as a field in the issue_field_history_columns in the dbt_project.yml for generating historical data for jira__daily_issue_field_history. If neither of those conditions hold, components is then removed from jira__daily_issue_field_history to avoid any improper joins. See lines 113-116 and lines 127-130 for the modifications in this join.

Screenshot 2023-05-10 at 1 54 33 PM

This conditional is then implemented throughout the model so that there are no issues for customers who choose not to leverage components as well.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

Before marking this PR as "ready for review" the following have been applied:

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":

Bug replication.

Look at field_id = 10019 for the field_option source model:

Screenshot 2023-05-09 at 10 43 40 PM

Then see the id = 10019 for the component source model:

Screenshot 2023-05-09 at 10 43 36 PM

Thus the component values returned downstream when the intial join occurs incorrectly returns the field_option values rather than the component names for the components field.

Screenshot 2023-05-09 at 10 29 16 PM

So we need to isolate components with its own specific logic.

PR solution and implementation
With the new join described above in the PR description, we ensure that if components exist, and are one of the fields mentioned in the issue field history table, any join from the existing table on the components field joins on the components table instead of field option and grabs the correct component names.

We made sure our seed files had the proper components information to be joined upstream

Screenshot 2023-05-10 at 3 23 46 PM Screenshot 2023-05-10 at 3 23 42 PM Screenshot 2023-05-10 at 3 23 37 PM

As you can see, the new version of the model returns the component value.

Screenshot 2023-05-10 at 12 01 16 AM

Looking upstream at models that flow from the daily issue field history like jira__issue_enhanced, you can see components is also being properly brought in.

Screenshot 2023-05-10 at 12 25 06 AM

We're on the right track!

What happens if components are not configured or part of field history?

Say we remove components from the issue_field_history_columns in our dbt_project.yml. That means we should only return summary and story points if those fields are present in the issue_field_history

We added summary rows to our seed files to test this (see the above seed files for the intial PR fix.

The resulting jira__daily_issue_field_history. was the following after execution.

Screenshot 2023-05-10 at 1 10 01 PM

You can see that components is now removed after being included in the initial run, but the summary field is populated with the appropriate information.

Similar behavior occurs when the variable jira_using_components is set to false.

What happens on an incremental load?

Because of the potential sizing of the jira__daily_issue_field_history model, many customers opt to do incremental loads to only load the most recent set of updated rows. We tested this by modifying the int_jira__issue_calendar_spine to filter on a specific date range so only certain components are returned up to a certain day, then running a dbt run --full-refresh.

Screenshot 2023-05-10 at 11 03 05 AM

This would then return records for this issue up to '2020-11-14'.

Screenshot 2023-05-10 at 11 27 32 AM

We then modified the date range by a day ('2020-11-15'), then executed a dbt run to see if the components loaded properly. Screenshot 2023-05-10 at 12 36 58 PM

Screenshot 2023-05-10 at 12 32 05 PM

Just to be safe, we then modified the date range filter in the calendar spine by several days (up to '2020-11-21') to see if there was any issue with multiple dates being incrementally loaded, then executed a dbt run.

Screenshot 2023-05-10 at 12 50 35 PM

All looks good!

Running the same steps when components are not being utilized achieved similar results to above--a successful load of the summary field by day, with components excluded.

Non-incremental sense check

Removing the date_day filters on int_jira__issue_calendar_spine and running dbt run --full-refresh and dbt run on jira__daily_issue_field_history yields the full table of results with the correctly populated component name if components are being leveraged.

Screenshot 2023-05-10 at 3 57 20 PM

Likewise, if components are not being leveraged, it removes components as a field while still correctly populating summary.

Screenshot 2023-05-10 at 3 59 33 PM

Standard Updates

Please acknowledge that your PR contains the following standard updates:

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

If you had to summarize this PR in an emoji, which would it be?

🐈‍⬛
fivetran-avinash commented 1 year ago

@fivetran-joemarkiewicz The above issue has been addressed to bring in component ids if there are multiple values, but hasn't been addressed to change them back into component names. Additionally, I'm seeing this issue recur for other field values, like sprints. See the below screenshot. Screenshot 2023-05-17 at 1 01 53 PM

So, we will need to create a new feature request to handle this particular inconsistency.

For the purposes of this bug though, it does bring in the proper component values for each issue and day (even if in varying name or id form), so this should close out this particular task.