fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 15 forks source link

BUG - Duplicate in jira_issues #51

Closed airphilou closed 1 year ago

airphilou commented 2 years ago

Are you a current Fivetran customer? Philippine Marionnet, Data Analyst Product, Aircall

Describe the bug There are some duplicates in the dbt table for jira_issues_custom_fields

Steps to reproduce

select issue_id, count(*)
from jira_issues_custom_fields 
group by 1;

Expected behavior I would like to have a unique issue_id for each line.

Project variables configuration

A good package name should reflect your organization's name or the intended use of these models

name: 'aircall' version: '1.0.0' config-version: 2

This setting configures which "profile" dbt uses for this project.

profile: 'aircall'

source-paths: ["models"] analysis-paths: ["analysis"] test-paths: ["tests"] data-paths: ["data"] macro-paths: ["macros"]

target-path: "target" # directory which will store compiled SQL files clean-targets: # directories to be removed by dbt clean

models: aircall: +materialized: view_if_not_exists intermediate_marts: +schema: intermediate sources: +schema: sources

vars: dbt_date:time_zone: 'UTC' phone_schema: "client_events_phone_staging" android_schema: "client_events_android_staging" ios_schema: "client_events_ios_staging" browser_extension_schema: "client_events_browser_extension_staging" dashboard_schema: "client_events_dashboard_staging" dry_run : False #adding a flag variable to the compile step in order to avoid failure during compiling, when the current code depends on the data of a model which is not yet materialized jira: jira_include_comments: false

----- IF YOU WANT TO ADD A FIELD TO jira_changelog -----

field_columns: # lower case with _ if more than 1 word
  - name: status
    table: stg_jira__status
    joining_id: status_id
    source_field: status_name
  - name: assignee
    table: stg_jira__user
    joining_id: user_id
    source_field: user_display_name
  - name: assignment_group
    table: clean_jira_field_option
    joining_id: field_id
    source_field: field_name

# ----- IF YOU WANT TO ADD A FIELD TO jira_issues_d (and jira_issues) ----- #
issue_field_history_columns: ['customfield_10161', 'assignee', 'priority', 'customfield_10170','customfield_10207'] # variable used in jira__daily_issue_field_history

# ----- IF YOU WANT TO ADD A CUSTOM FIELD TO jira_issues ONLY (not to jira_issues_d)----- #
custom_fields_columns: ['product_category','severity','labels','blast_radius','region','service_impact','symptoms']

# ----- IF YOU WANT TO ADD A SPECIAL CUSTOM FIELD TO jira_issues ONLY (not to jira_issues_d)----- #
# ----- special custom field: field for which the value is not represented by an id but its value in int_jira__combine_field_histories----- #
special_custom_fields_columns: ['story_points', 'email_verification','issue_link_count']

intercom: using_contact_tags: false using_team: false

Package Version packages:

Warehouse

Additional context

**Screenshots** ![image](https://user-images.githubusercontent.com/95766824/152760370-b6854259-9abc-407d-84f2-a76cc3980cd6.png) **Please indicate the level of urgency** It is impacting my reporting of a critical dashboard. **Are you interested in contributing to this package?** - [ ] Yes, I can do this and open a PR for your review. - [ ] Possibly, but I'm not quite sure how to do this. I'd be happy to do a live coding session with someone to get this fixed. - [x] No, I'd prefer if someone else fixed this. I don't have the time and/or don't know what the root cause of the problem is.
fivetran-joemarkiewicz commented 2 years ago

@airphilou thanks for opening this issue!

Additionally, thank you for providing all the supporting variables and configurations you have on your end. After reading through your issue I have a question regarding the origin of the jira_issues_custom_fields table. This is not a table that is materialized by one of the models in this package. Further, we don't support some of the variables you have defined in your dbt_project.yml for our Jira package.

For example, I am not familiar with any custom field configurations outside of our jira__daily_issue_field_history model. Is the jira_issues_custom_fields table by chance an extension of one of our end models that you are applying additional transformations to? If that is the case, then would you be able to provide some more details around what other transformations you are applying that may be causing the data discrepancy you are seeing?

Thanks!

airphilou commented 2 years ago

@fivetran-joemarkiewicz thank you for your quick answer,

I apologize for the misunderstanding and not clear description.

I deep dive further in the subject and here is the new detailed explanation :

  1. As you can see in the picture attached there are 2 epics in progress for the same issue_id.
  2. In fact, there are both valid at the same time, so it creates a duplicate.

Let me know if it is still unclear for you, Thank you very much for your help 🥇

image
fivetran-joemarkiewicz commented 2 years ago

Hi @airphilou thanks for this additional context. This greatly helps me understand what is going on here.

I have a few quick questions:

  1. Do you know why there would be two epics active at the same time for a single issue?
  2. If this is a normal behavior of Jira, what would your preferred outcome in a situation like this be? Would you prefer only one record flows into an end model. If so, how would you determine which epic to use?
airphilou commented 2 years ago

Hi @fivetran-joemarkiewicz,

  1. Normally it's impossible to have two epics at the same time for a single issue. It's a single field.
  2. It is not a normal behavior but it is the last epic selected that should appear in the end model.

Thank you very much,

fivetran-jamie commented 2 years ago

@airphilou hey there -- it looks like you're using an outdated version of the package, and there have been some field-history updates that may be related here.

could you dbt clean && dbt deps and re-run after changing your jira package version from [">=0.5.0", "<0.6.0"] to [">=0.7.0", "<0.8.0"] ?

fivetran-jamie commented 2 years ago

@airphilou please lmk when you've had a chance to run the above! 🙂

airphilou commented 2 years ago

Hello @fivetran-jamie,

Our data team is actually updating our dbt version. After the release i will try to upgrade the jira package and let you know. Thank you very much for the reminder.

Bests, Philippine

fivetran-jamie commented 2 years ago

hi @airphilou - i was pondering this further and actually realized that there may be an issue with our incremental logic in the int_jira__combine_field_histories model.

to confirm this, could you run dbt run -m +int_jira__combine_field_histories --full-refresh and see if the duplicates for issue 145581 and field customfield_10014 persist?

if that works, i would also be interested in hearing if the non-incremental runtime for the int_jira__combine_field_histories became problematically long or not. i'm thinking that the simplest solution may be to materialize this model as a normal table (if this is actually the issue 🙂 )

airphilou commented 2 years ago

Hello @fivetran-jamie,

I've done the full refresh and the duplicates are still there. But, the new dbt version will be updated today or tomorrow, so I will let you know if it fixes the problem. Thanks you very much for your help :)

airphilou commented 2 years ago

Hello @fivetran-jamie, It works 💯 thanks to the upgrade of the packages :

image

Thank you very much for your help

fivetran-jamie commented 2 years ago

huzzah! glad it's working for you. i'll go ahead and close this then 😄

airphilou commented 2 years ago

Hello @fivetran-jamie,

After some new investigations we finally find out some other duplicates :

We have still a lot of fields that have a start date of validity but no end date, for lots of different values, so that creates duplicates.

For example, this field, which is a multi-selected field, normally should have only one line without valid_ending_at, with as field_value a list of values. So we think that there is a problem with multiselects, which creates duplicates in jira_issues_custom_fields.

But, from what we see there is also a problem with normal fields, like status for example, which means that we probably also have duplicates in jira_issues ...

Capture d’écran 2022-06-10 à 16 00 52

Could you please help us on this topic again ?

Thank you very much for your help and have a good week-end ☀️ Philippine

fivetran-joemarkiewicz commented 2 years ago

@airphilou thanks for raising this back up!

Our team will definitely want to take a closer look into this to ensure the issue is resolved. We have a new sprint starting next Wednesday and will start to dig into this more then. In the meantime, there have been a few updates to the Jira package since we last communicated. Would you mind sharing which version of the dbt_jira package you are using where you see the duplicates?

Thanks!

airphilou commented 2 years ago

@fivetran-joemarkiewicz thanks for your quick answer,

Here are the packages we are using :

image

Thank you very much!

fivetran-joemarkiewicz commented 1 year ago

HI @airphilou I am so sincerely sorry for not addressing your previous response in a timely manner.

Would you be able to upgrade to the latest version of our jira package v0.8.2 (please be aware of any breaking changes) and let us know if you still see the issue. Let me know how the upgrade goes!

airphilou commented 1 year ago

Hi @fivetran-joemarkiewicz it seems to work thank you very much !!!!!

fivetran-joemarkiewicz commented 1 year ago

Brilliant! 🎉 We will plan to push this update (along with a few others) early next week.

fivetran-joemarkiewicz commented 1 year ago

@airphilou this issue has been addressed in the latest release! You can upgrade your package to get the latest code updates with the fix addressed.

If the issue persists, please feel free to reopen this issue.