fivetran / dbt_jira

Data models for Fivetran's Jira connector built using dbt.
https://fivetran.github.io/dbt_jira/
Apache License 2.0
8 stars 13 forks source link

int_jira__issue_calendar_spine performs slowly on snowflake #111

Open jschintz-windriver opened 9 months ago

jschintz-windriver commented 9 months ago

Is there an existing feature request for this?

Describe the Feature

The model 'int_jira__issue_calendar_spine' is performing slowly on snowflake - taking about 10-15 minutes depending on the run. Our instance has about 600M records for this table.
The query plans show that all records are being deleted/inserted for each run. Should this be materialized as a table instead?

14:51:31 1455 of 1614 OK created sql incremental model AV_int_jira.int_jira__issue_calendar_spine [SUCCESS 650585743 in 597.48s] Shouldn't only a small % be loaded with incremental strategy not 99%?

Describe alternatives you've considered

Fork the repo and change myself ... but that requires upkeep :(

Are you interested in contributing this feature?

Anything else?

No response

fivetran-reneeli commented 9 months ago

Hi @jschintz-windriver! Thanks for raising this and glad to take a look. I don't believe it should be behaving as a full delete and re-load since it's incremental. Can you share more info on your configs, version of the package, and specific warehouse you are using?

fivetran-reneeli commented 9 months ago

Hi @jschintz-windriver, just following up!

jschintz-windriver commented 9 months ago

@fivetran-reneeli Sorry for the delay!

Some info about our deployment

fivetran-reneeli commented 9 months ago

Thanks @jschintz-windriver!

I am wondering if there's something in your project yml that's overriding the config set in the model. Because given line 7, it shouldn't be exhibiting that behavior.

incremental_strategy = 'merge' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert'

And this model should be materialized as incremental based on this line. So after your initial run, so you should see, for example this line of code compiling.

where cast( date_day as date) >= (select min(earliest_open_until_date) from

Would you be able to share your dbt_project.yml?

jschintz-windriver commented 9 months ago

@fivetran-reneeli The incremental logic is indeed compiling - I just don't think its performant for large datasets. I made a suggested change in my fork and opened a PR in your repo.
Do you see any issues with including this new logic? It reduced the model load time from 10 minutes to 2 with our deployment. https://github.com/fivetran/dbt_jira/pull/113

fivetran-reneeli commented 8 months ago

Hi @jschintz-windriver, our team looked into this, and can confirm we see the same long loading issue with the model. Thank you for opening the PR! When we went through this we saw some opportunities to make this more performant, but this will be helpful as a starting point as we tackle this.

jschintz-windriver commented 8 months ago

@fivetran-reneeli Thanks so much for the update and taking the time to help us out.

Let me know if there's any testing on my part that could be useful.

fivetran-joemarkiewicz commented 7 months ago

Hi @jschintz-windriver I just wanted to reach back out and let you know this update is still on our roadmap. At the moment it is seeming like we will be able to pick this up and dive deeper in improving the model's performance after the holidays.

We will be sure to share more updates in this thread once we pick up this ticket.

fivetran-avinash commented 6 months ago

Hello @jschintz-windriver! We wanted to let you know we've been working on a fix for this issue of slow runtime, and we believe we've made several optimizations to the dbt_jira models that should significantly improve performance on incremental runs. You can see the changes that were applied to these models here and an explanation for these changes in the CHANGELOG additions.

Can you test these changes on the following branch by attempting a dbt incremental run and see how your models perform and whether there are any issues? Once we get confirmation that this branch works effectively for incremental runs on Snowflake, we can get these changes merged into the package.


- git: https://github.com/fivetran/dbt_jira.git
  revision: bugfix/optimize-calendar-spine
  warn-unpinned: false
fivetran-avinash commented 6 months ago

Hi @jschintz-windriver ! Just bumping this back up for visibility. Can you test this branch when possible? We don't want to move forward with pushing this fix until we have validation that this is improving the performance of your calendar spine.

fivetran-avinash commented 5 months ago

Hello @jschintz-windriver ! Hope all is well. Will you be able to take a look at our new branch and see if this improves overall performance on Snowflake? Let us know if you need any assistance!