fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 13 forks source link

[Bug] daily_history model and enhanced model producing different values for the same field name #107

Closed kenzie-marsh closed 10 months ago

kenzie-marsh commented 11 months ago

Is there an existing issue for this?

Describe the issue

The main problem is that we are seeing different values for the same field (issue_type) between the jira__daily_issue_field_history table and the jira__issue_enhanced table. It appears that _jira__issueenhanced reflects the raw.jira.issue_type.name value, while _daily_issue_fieldhistory reflects the raw.jira.field_option.name value - possibly because both have the same ID.

The results of this query provide an example:

select 
    i.id as issue_id
    , t.id::string AS issue_type_id
    , t.name AS issue_type_name
    , fh.value AS field_history_value
    , fo.id as field_option_id
    , fo.name AS field_option_name
    , e.issue_type_id::string AS enhanced_issue_type_id
    , e.issue_type AS enhanced_issue_type
    , d.issue_type AS daily_history_issue_type 
from raw.jira.issue i
left join raw.jira.issue_type as t
    on t.id = i.issue_type
left join raw.jira.issue_field_history fh
    on fh.issue_id = i.id
left join biz.dbt.jira__issue_enhanced e 
    on e.issue_id = i.id
left join biz.dbt.jira__daily_issue_field_history d 
    on d.issue_id = e.issue_id
left join raw.jira.field_option fo 
    on fo.id = i.issue_type
where i.id = 183458
    and fh.field_id = 'issuetype'
GROUP BY 1,2,3,4,5,6,7,8,9

Results:

ISSUE_ID ISSUE_TYPE_ID ISSUE_TYPE_NAME FIELD_HISTORY_VALUE FIELD_OPTION_ID FIELD_OPTION_NAME ENHANCED_ISSUE_TYPE_ID ENHANCED_ISSUE_TYPE DAILY_FIELD_HISTORY_ISSUE_TYPE
183458 10736 Delivery Site 10736 10736 Internal Quality Issue 10736 Delivery Site Internal Quality Issue

According to @fivetran-avinash , this could be because Currently our data logic coalesces the field option name over the initial column name if it exists. [You can see this logic in action in Line 119 here.](https://github.com/fivetran/dbt_jira/blob/main/models/jira__daily_issue_field_history.sql#L119).

Relevant error log or model output

No response

Expected behavior

Ideally, we would be able to track both fields separately within the daily table. In other words, our end-users need to be able to filter on both Delivery Site issues and Internal Quality Issue issues separately. Also, ideally the _issueenhanced table values would match the _dailyhistory table values for fields of the same name - right now the difference is causing some confusion.

dbt Project configurations

models: jira: schema: dbt jira_source: schema: dbt_stg

...

vars: jira_database: raw jira_schema: jira

jira: issue_field_history_columns: [ 'Activity' ,'Activity Date' ,'Affects versions' ,'Approved By' ,'Assignee' ,'Battery Serial Number' ,'BIS Link' ,'BIS Issue Type' ,'BIS Application' ,'BIS Root Cause' ,'Body Serial Number' ,'Cold Chain equipment type and model' ,'Complaint Source' ,'Complaint Type' ,'Components' ,'Containment Status' ,'Contributing factors during delivery' ,'Created' ,'Creation Reason' ,'Creator' ,'Critical Incident Count' ,'Current Customer Trust Level' ,'Customer Account' ,'Customer Issue Type' ,'Customer Type' ,'Customers Impacted' ,'Department' ,'Date and Time' ,'Delivery Issue Type' ,'Description' ,'Donors' ,'ECO #' ,'Epic Link' ,'Equipment Failure' ,'Equipment Status' ,'Exception Name' ,'Failure Category' ,'Failure Type' ,'Fix versions' ,'Flight ID' ,'Grounded Status' ,'Human injury caused by package' ,'Impact' ,'Impact on customer/patient' ,'Incident Timestamp' ,'Internal/External' ,'Issue Category' ,'Issue Category Label' ,'Issue Summary'
,'Issue Type' ,'Labels' ,'Location Relative to Nest' ,'Logs' ,'Loss product description' ,'LRU' ,'Make it Right Level' ,'Mission Stage' ,'Nest' ,'New Part Number' ,'Operator Report' ,'Order ID' ,'Order Types' ,'ORN' ,'OSD Checkboxes' ,'Other failure selection' ,'Package Drop location' ,'Package ID' ,'Parent Link' ,'Part Number' ,'People Reached' ,'Priority' ,'Product Risk' ,'Project' ,'Property damage caused by package' ,'Quantity' ,'Rank' ,'Released In' ,'Request participants' ,'Request Type' ,'Required Approvers' ,'Resolution' ,'Resolution Reason' ,'Resolved' ,'Root Cause' ,'Root Cause Category' ,'Root Failure' ,'Safety Performance Target Affected' ,'Satisfaction' ,'Serial Number' ,'Severity' ,'SLA' ,'Stakeholders' ,'Status Category Changed' ,'Story Points' ,'Sub-tasks' ,'Summary' ,'Supplier' ,'Time to approve normal change' ,'Time to first response' ,'Time to close after resolution' ,'Time to resolution' ,'Troubleshooting Results' ,'Unit ID' ,'Updated' ,'Viable Units' ,'Vehicle Generation' ,'Wing Serial Number' ,'Yes/No' ,'Yes/No/Unsure' ,'Zip Number' ,'Zone' ]

Package versions

packages:

What database are you using dbt with?

snowflake

dbt Version

Core:

Additional Context

Initial convo about the issue here: https://getdbt.slack.com/archives/C01D1R2JLLA/p1693234610157279

Are you willing to open a PR to help address this issue?

fivetran-jamie commented 11 months ago

Hi @kenzie-marsh, thanks for opening this issue! I'm curious more about your expected/ideal outcome.

Ideally, we would be able to track both fields separately within the daily table. In other words, our end-users need to be able to filter on both Delivery Site issues and Internal Quality Issue issues separately. Also, ideally the issue_enhanced table values would match the daily_history table values for fields of the same name - right now the difference is causing some confusion.

If you were to have both fields/versions of the field in one model, how would you interpret them differently? Also, for the field_option version, what is the name of its corresponding field in the jira.field table? Is it also issue type?

kenzie-marsh commented 10 months ago

Thanks for looking into this @fivetran-jamie!

I essentially proposed having both fields in one model as a compromise solution, because Avinash (on the Slack thread) mentioned that other customers would expect to see the field_option value. We really only need the issue_type value (Delivery Site in the example above). In fact, in the Jira UI itself, I can't find any field that reflects Internal Quality Issue on the example issue that we've been looking at, ~but I have an ask out to the team to see if they can show me where that field may have come from in the UI~ the team has let me know that they are not aware of any field equivalent to Internal Quality Type, so that value should not be appearing for this issue at all.

On the second question, I don't believe the field_option table corresponds to the field table. Based on this documentation, it sounds like it can only be joined to ISSUE and ISSUE_<FIELD>_HISTORY. I've also put some screenshots below where I attempted to check if the field table has any corresponding values, and it doesn't appear so.

image image

Let me know if this is helpful and/or what you think. Thank you again!

fivetran-jamie commented 10 months ago

Thanks for your response! It seems like we need to be treating issue_type basically the same way we treat status and components (whose values are stored in their respective source tables instead of field_option) in https://github.com/fivetran/dbt_jira/blob/main/models/jira__daily_issue_field_history.sql and potentially some intermediate models

Are there any other fields where you see this or other strange behavior? @kenzie-marsh

kenzie-marsh commented 10 months ago

@fivetran-jamie that sounds right to me!

I just double checked if there are any other fields that don't match between the daily table and the issue enhanced table, and issue_type appears to be the only one.

Thank you again, please let me know if you need anything else from me!

fivetran-catfritz commented 10 months ago

Hi @kenzie-marsh, to update you, I have picked up this task and will aim to send you a test branch to try out in the next few days.

fivetran-catfritz commented 10 months ago

Hi @kenzie-marsh, I have a branch that is proving out on our end and would love to get your feedback on it. Below is the test branch to replace the regular dbt_jira package if you are able to try it out! This would also need a full refresh to see the updates.

packages:
- git: https://github.com/fivetran/dbt_jira.git
  revision: bug/issue-type
  warn-unpinned: false
kenzie-marsh commented 10 months ago

Thanks so much @fivetran-catfritz! I just tested it out with a full-refresh and got this error:

17:02:57  Finished running 18 view models, 25 table models, 6 incremental models in 0 hours 1 minutes and 57.55 seconds (117.55s).
17:02:57  
17:02:57  Completed with 1 error and 0 warnings:
17:02:57  
17:02:57  
17:02:57  Database Error in model jira__daily_issue_field_history (models/jira__daily_issue_field_history.sql)
17:02:57    002028 (42601): SQL compilation error:
17:02:57    ambiguous column name 'DESCRIPTION'
17:02:57    compiled Code at target/run/jira/models/jira__daily_issue_field_history.sql

We do have the following in our project variables, my guess is that's causing the issue? Maybe we don't need to set this as a custom field if it's already getting included?

jira:
issue_field_history_columns: [
      ,'Description'

LMK what you think. Thank you!

fivetran-catfritz commented 10 months ago

@kenzie-marsh Thanks for trying this out and flagging! I also got the same error when adding in 'description'. It looks like we needed to add some logic to make sure all columns get fully qualified in certain CTEs.

I just pushed an update to the test branch that worked out on my end. Would you mind giving this another try?

kenzie-marsh commented 10 months ago

That appears to have worked! Models ran successfully and I'm no longer seeing the original issue.

Thank you so much for your help!!

fivetran-catfritz commented 10 months ago

Hooray! Thanks again for testing it out! Hopefully this will get released in the next week but will keep you posted.

fivetran-catfritz commented 10 months ago

Confirming this latest release has been published to dbt hub. https://hub.getdbt.com/fivetran/jira/latest/