fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 15 forks source link

Problem with the last version of DBT package: fivetran/jira #36

Closed RivkiHofman closed 3 years ago

RivkiHofman commented 3 years ago

Hi

Attached file with explanation on the bug.

we upgrade the package: : fivetran/jira To the last version: 0.4.0. and have some issues:

  1. I set in variable: jira_issue_history_buffer = 12, and I expect to get the data of closed issues - 12 months from resolved date. For example, I have an issue that closed at 02.02.2020, I want the data until 02.2021, but I get the data until today. Can you help me and guide me if need to declare an additional variable, or need an another one?

  2. We use with variable : issue_field_history_columns , I set in this var: issue_field_history_columns: ['Story Points','priority','summary','Team','Closed','Capitalize','Regression','Epic Story Points','Requirement Category','_FIVETRAN_SYNCED', 'Found By'] Most of the fields without values after I run the new package. I investigated the code and find that there is a new logic in file: int_jira__daily_field_history.sql in cte: limit_to_relevant_fields

The old code:

limit_to_relevant_fields as (

-- let's remove unncessary rows moving forward and grab field names select combined_field_histories.*

from combined_field_histories

where 
lower(field_name) in ('sprint', 'status' 
                            {%- for col in var('issue_field_history_columns', []) -%}
                            , {{ "'" ~ (col|lower) ~ "'" }}
                            {%- endfor -%} )

),

The new code:

limit_to_relevant_fields as (

-- let's remove unncessary rows moving forward and grab field names select combined_field_histories.*

from combined_field_histories

where lower(field_name) = 'sprint' -- As sprint is a custom field, we filter by field name only for sprint. All others are on field_id.
    or lower(field_id) in ('status' 
        {%- for col in var('issue_field_history_columns', []) -%}
            , {{ "'" ~ (col|lower) ~ "'" }}
        {%- endfor -%} )

),

I see the comment that all fields aren't custom(exept sprint), but most of the fields that I use are custom:

In the meantime I return to the old version , but we need urgently a new feature in the new version.

Can you check and update us asap?

Thank a lot for your support. Rivki

fivetran-jamie commented 3 years ago

hey there @RivkiHofman! what version of the package were you using prior to v0.4.0?

  1. when was the last time the issue was updated? the buffer period is in relation to the last time an issue was updated, so if an issue was updated after already being resolved, the field history will extend to 12 months after this update/up to the current date if needed
  2. a previous version of this package required you to input field names for the issue_field_history_columns variable. however, that produced some issues, so issue_field_history_columns now requires field IDs. if you swap the IDs and names for the columns you've selected for this variable, does it work properly?
RivkiHofman commented 3 years ago

Thanks for your answer, I replaced the name with id and now I have data, but I think that there is a problem in the file: jira__daily_issue_field_history.sql.

I see that the logic was changed and there is a new cte:

set_values as (

select
    date_day,
    issue_id

    {% for col in pivot_data_columns if col.name|lower not in ['issue_id','issue_day_id','valid_starting_on'] %}
    , {{ col.name }}
    -- create a batch/partition once a new value is provided
    , sum( case when {{ col.name }} is null then 0 else 1 end) over (
        order by issue_id, date_day rows unbounded preceding) as {{ col.name }}_field_partition

    {% endfor %}

from joined

),

Part of my data is incorrect, and I think that in SUM() analytic function missed partition by.

Now it is: sum( case when {{ col.name }} is null then 0 else 1 end) over ( order by issue_id, date_day rows unbounded preceding) as {{ col.name }}_field_partition And maybe need to be: sum( case when {{ col.name }} is null then 0 else 1 end) over ( partition by issue_id order by date_day rows unbounded preceding) as {{ col.name }}_field_partition

Can you check it asap? Thank a lot.

RivkiHofman commented 3 years ago

Hi

I added a comment in Github to the open issue: fivtran/jira #36. Can you check it asap? It is very urgent us, we want to go live with the new version but part of data is incorrect.

Thank a lot for your support. Rivki

‫בתאריך יום ג׳, 10 באוג׳ 2021 ב-21:37 מאת ‪Jamie Rodriguez‬‏ <‪ @.***‬‏>:‬

hey there @RivkiHofman https://github.com/RivkiHofman! what version of the package were you using prior to v0.4.0?

  1. when was the last time the issue was updated? the buffer period is in relation to the last time an issue was updated, so if an issue was updated after already being resolved, the field history will extend to 12 months after this update/up to the current date if needed
  2. a previous version of this package required you to input field names for the issue_field_history_columns variable. however, that produced some issues, so issue_field_history_columns now requires field IDs. if you swap the IDs and names for the columns you've selected for this variable, does it work properly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fivetran/dbt_jira/issues/36#issuecomment-896223791, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATD4UUR2QTKJMEM5NBA4NZLT4FWXJANCNFSM5B4F7FOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

fivetran-jamie commented 3 years ago

@RivkiHofman hi! we'll be looking into this today/tomorrow and let you know if we find anything. in the meantime, it would helpful if you could book a slot during our office hours tomorrow or next week (link here)

RivkiHofman commented 3 years ago

Hi

Thank for the conversation yesterday. Do you have an estimation when the problem will be resolved?

Thanks.

fivetran-jamie commented 3 years ago

hey there! we're investigating this some more and will realistically be able to prioritize releasing a solution during our next sprint, which begins next thursday. however, we recognize the urgency here and will update this thread with any progress and/or workarounds in the meantime. if you'd like to expedite the process, feel free to open a PR with your partition by solution!

fivetran-joemarkiewicz commented 3 years ago

@RivkiHofman we were able to look further into this issue and believe we were able to produce a fix with the help you provided earlier in this ticket and on our office hours call!

I have created a working branch fix/field-history-partition and have opened a PR #37 to integrate this fix into our next release of the Jira package. In the meantime, would you be willing to try the working branch out on your Jira data? You may do so by adding the below config in place of your jira dependency within your packages.yml:

packages:
    - git: https://github.com/fivetran/dbt_jira.git
      revision: fix/field-history-partition
      warn-unpinned: false

Let me know if this works. Thank you again for raising this to our attention!

RivkiHofman commented 3 years ago

Hi

Thank a lot, I very appreciate your effort. The data is perfect now. When will the fix go live? Is it possible deploy to my prod environment, my branch with the package configuration that you sent me for test, and when you will deploy the fix I will change the configuration? Or I mustn't do it?

Thank you very much and have a nice day

fivetran-joemarkiewicz commented 3 years ago

Thanks so much @RivkiHofman! I am glad the data is accurate on your end now and we again really appreciate you bringing this to our attention.

You can most definitely use the branch I noted within your production environment. We are also planning to address an additional feature request for Jira this sprint and will most likely release a new version mid next week. I will keep this ticket open and will respond once the new version is released so you may update the package dependency on your end.

Let me know if you have any other questions 😄

RivkiHofman commented 3 years ago

Hi

I deployed it and our users are very satisfied. I have 2 questions:

  1. Since I run the new version of jira, the generated docs failed on: 'NoneType' object has no attribute 'lower' In the meantime, DBT support team don't success to resolve it, have you encountered such a thing? can you help me with this?
  2. Our users asked if when they will move to jira cloud, it will affect on the structure of tables or something on DWH, do you have this information? if not, do you know who can answer?

Thank a lot

fivetran-joemarkiewicz commented 3 years ago

Hi @RivkiHofman

  1. Since I run the new version of jira, the generated docs failed on: 'NoneType' object has no attribute 'lower' In the meantime, DBT support team don't success to resolve it, have you encountered such a thing? can you help me with this?

    • I have seen this error before, but when I run the same generate docs command on the branch I do not receive that error. Would you be able to share your dbt_project.yml contents as this may help identifiy if the issue is being caused by a variable?
  2. Our users asked if when they will move to jira cloud, it will affect on the structure of tables or something on DWH, do you have this information? if not, do you know who can answer?

    • I do not imagine this will have an affect, but I have reached out to the Jira PM to double check for you!
RivkiHofman commented 3 years ago

Thanks for your answer. Attached my dbt_project.yml dbt_project_yml.txt

fivetran-joemarkiewicz commented 3 years ago

Hi @RivkiHofman for security reasons I am not able to download and open the txt file you provided. Would you be able to copy the contents into this thread instead?

Also, I did confirm with our PM that switching the Jira cloud will not change your data structure.

RivkiHofman commented 3 years ago

This is the code of yml file:

Name your project! Project names should contain only lowercase characters

and underscores. A good package name should reflect your organization's

name or the intended use of these models

name: 'rnd_project' version: '1.0.0' config-version: 2 vars: jira_issue_history_buffer: 12 issue_field_history_columns: ['customfield_10006','priority','summary','customfield_10116','customfield_10206','customfield_10303','customfield_10111','customfield_10407','customfield_11030', 'customfield_10307'] jira_database: FIVETRAN_DATABASE jira_schema: JIRA tfs_schema: DWH_JIRA field_option_local: "{{ source('bi_jira', 'field_option') }}" tfsdwhdatatotableau: "{{ source('bi_tfs', 'tfsdwhdatatotableau') }}" tfs_history_data: "{{ source('bi_tfs', 'tfs_history_data') }}" blackduck_database: BLACKDUCK blackduck_schema: DWH cvs_final: "{{ source('bi_blackduck', 'cvs_final') }}"

models: jira: +schema: dwh_jira jira_source:
+schema: dwh_jira

rnd_project: nice_jira: tmp: materialized: view materialized: table

blackduck:
  tmp:
    materialized: view
  materialized: table
  schema: dwh_blackduck

This setting configures which "profile" dbt uses for this project.

profile: 'default'

These configurations specify where dbt should look for different types of files.

The source-paths config, for example, states that models in this project can be

found in the "models/" directory. You probably won't need to change these!

source-paths: ["models"] analysis-paths: ["analysis"] test-paths: ["tests"] data-paths: ["data"] macro-paths: ["macros"] snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files clean-targets: # directories to be removed by dbt clean

Configuring models

Full documentation: https://docs.getdbt.com/docs/configuring-models

In this example config, we tell dbt to build all models in the example/ directory

as tables. These settings can be overridden in the individual model files

using the {{ config(...) }} macro.

fivetran-joemarkiewicz commented 3 years ago

@RivkiHofman your variables look great and I just tried generating the docs with the same jira variables and did not encounter the same error. I wonder if this issue may not be originating from the Jira package. I have looked through dbt Slack and found a number of other users have encountered this in the past. If you remove the jira package dependency, do dbt clean and then dbt deps and then dbt docs generate do you still encounter this issue?

If you do not, then we need to look further into why you are experiencing the issue within the Jira package. If not, dbt Slack has a lot of other individuals that have encountered a similar issues which can hopefully provide some more information.

fivetran-joemarkiewicz commented 3 years ago

@RivkiHofman we are looking to release this update by mid week and I wanted to confirm if you were still experiencing an issue?

RivkiHofman commented 3 years ago

We run with this in our prod and the data looks good, we wait for update from you after the fix will go live.so that we will be able to config our package file with last version and continue with the additional developments. Thanks.

fivetran-joemarkiewicz commented 3 years ago

Thanks for the response @RivkiHofman. From that I believe the docs generate issue was not a cause of these updates. I will move forward with integrating the changes into the next release and will let you know here once those are live!

fivetran-joemarkiewicz commented 3 years ago

@RivkiHofman this update is now live within the v0.5.0 release of the dbt_jira package! Thank you again for raising this issue to our attention.

Please feel free to open a new issue if you have any further questions.