Open johnf opened 3 weeks ago
Hi @johnf thanks for reaching out to the team! Could you confirm the following to help me get a sense of what might be going on:
Hi @fivetran-catfritz,
I narrowed it down to apple_store because the query I pasted above looks like the problematic one. Right now it's been running for 10 hours. If I kill the query, the load drops back down to what I saw pre upgrade.
I'm not sure if you have any internal stats on the transformations, but if you do I suspect you'll see the apple ones are running all the time on my account (johnf@gladlyapp.com).
I'll pause it now so we can confirm 100%
Hi @johnf I reviewed your run logs and noticed the issue you mentioned. Could you try running this package on its own to see if the problem persists?
We've seen similar hangups with Postgres in dbt projects involving multiple packages, often alongside minor updates—like adding a single text field in this case. More puzzling, I've also encountered this issue in another Postgres instance where it occurred in a *_tmp
model that was simply a select *
.
The issue, however, hasn't appeared consistently enough for us to determine a root cause. I suspect Postgres locks might be a factor, though that’s just a theory for now. Running the dbt_apple_store
package in isolation could help clarify if that’s the case. If you're able to access your run logs, you can see the time each model takes. Just let me know if you have any questions!
@fivetran-catfritz I can confirm that if I disable the apple_store the load stays at 10% for hours. I'll try enabling only it to see what happens. Also worth noting that when that query is running it is the only one running
I've paused all transformations in the dashboard and then commented out all other packages and run them locally.
It's currently hanging on
22:53:15 39 of 44 START sql table model dbt_apple_store.apple_store__territory_report ... [RUN]
I suspect it will sit there for hours again.
FYI, I am happy to jump on a call if someone wants to guide me on running queries manually. This is very reproducible. Also, happy for you to connect to my DB from your end directly.
I kicked it off at 9:30am
With no other queries running, it does finish eventually but takes 80 minutes.
23:29:28 39 of 44 START sql table model dbt_apple_store.apple_store__territory_report ... [RUN]
00:49:08 39 of 44 OK created sql table model dbt_apple_store.apple_store__territory_report [SELECT 130595 in 4780.70s]
00:49:08 40 of 44 START sql table model dbt_apple_store.apple_store__app_version_report . [RUN]
01:21:40 40 of 44 OK created sql table model dbt_apple_store.apple_store__app_version_report [SELECT 261984 in 1951.45s]
01:21:40 41 of 44 START sql table model dbt_apple_store.apple_store__device_report ...... [RUN]
01:22:41 41 of 44 OK created sql table model dbt_apple_store.apple_store__device_report . [SELECT 18174 in 61.07s]
01:22:41 42 of 44 START sql table model dbt_apple_store.apple_store__platform_version_report [RUN]
It isn't a super powerful instance, it is an Azure Burstable, B1ms, 1 vCores, 2 GiB RAM, 32 GiB storage
but this seems like a long time for not that much data.
Hi @johnf Thank you for running that! 80 mins does seem a bit long for the size. Before we dive deeper, could you check one more thing? If you downgrade to the prior version of dbt_apple_store and run that independently, what is the run time like? I think it would help to have a couple baselines.
@fivetran-catfritz OK good news, it's super fast on the old version, only 3 seconds.
21:50:16 39 of 44 START sql table model dbt_apple_store.apple_store__territory_report ... [RUN]
21:50:19 39 of 44 OK created sql table model dbt_apple_store.apple_store__territory_report [SELECT 133045 in 3.33s]
For reference this is the diff between the queries
--- before 2024-08-28 07:54:58.434667925 +1000
+++ after 2024-08-28 07:54:59.798667716 +1000
@@ -23,6 +23,7 @@
),
reporting_grain as (
select distinct
+ source_relation,
date_day,
app_id,
source_type,
@@ -31,6 +32,7 @@
),
joined as (
select
+ reporting_grain.source_relation,
reporting_grain.date_day,
reporting_grain.app_id,
app.app_name,
@@ -54,18 +56,22 @@
from reporting_grain
left join app
on reporting_grain.app_id = app.app_id
+ and reporting_grain.source_relation = app.source_relation
left join app_store_territory
on reporting_grain.date_day = app_store_territory.date_day
+ and reporting_grain.source_relation = app_store_territory.source_relation
and reporting_grain.app_id = app_store_territory.app_id
and reporting_grain.source_type = app_store_territory.source_type
and reporting_grain.territory = app_store_territory.territory
left join downloads_territory
on reporting_grain.date_day = downloads_territory.date_day
+ and reporting_grain.source_relation = downloads_territory.source_relation
and reporting_grain.app_id = downloads_territory.app_id
and reporting_grain.source_type = downloads_territory.source_type
and reporting_grain.territory = downloads_territory.territory
left join usage_territory
on reporting_grain.date_day = usage_territory.date_day
+ and reporting_grain.source_relation = usage_territory.source_relation
and reporting_grain.app_id = usage_territory.app_id
and reporting_grain.source_type = usage_territory.source_type
and reporting_grain.territory = usage_territory.territory
I also ran explain on both querys, but t's greek to me :grin:
FAST
Hash Left Join (cost=31584.11..37422.28 rows=83150 width=1268)
Hash Cond: (app_store_territory_1.app_id = stg_apple_store__app.app_id)
CTE app_store_territory
-> Seq Scan on stg_apple_store__app_store_territory (cost=0.00..3122.45 rows=133045 width=70)
CTE country_codes
-> Seq Scan on apple_store_country_codes (cost=0.00..6.50 rows=250 width=71)
-> Merge Right Join (cost=28427.16..31303.04 rows=20788 width=2300)
Merge Cond: ((stg_apple_store__downloads_territory.territory = app_store_territory_1.territory) AND (stg_apple_store__downloads_territory.date_day = app_store_territory_1.date_day) AND (stg_apple_store__downloads_territory.app_id = app_store_territory_1.app_id) AND (stg_apple_store__downloads_territory.source_type = app_store_territory_1.source_type))
-> Sort (cost=17939.56..18273.07 rows=133406 width=61)
Sort Key: stg_apple_store__downloads_territory.territory, stg_apple_store__downloads_territory.date_day, stg_apple_store__downloads_territory.app_id, stg_apple_store__downloads_territory.source_type
-> Seq Scan on stg_apple_store__downloads_territory (cost=0.00..2998.06 rows=133406 width=61)
-> Materialize (cost=10487.61..11488.02 rows=20788 width=2276)
-> Merge Left Join (cost=10487.61..11436.05 rows=20788 width=2276)
Merge Cond: (app_store_territory_1.territory = (alternative_country_codes.alternative_country_name)::text)
-> Merge Left Join (cost=10472.65..11067.70 rows=16630 width=1212)
Merge Cond: (app_store_territory_1.territory = (official_country_codes.country_name)::text)
-> Merge Left Join (cost=10457.69..10770.03 rows=13304 width=148)
Merge Cond: ((app_store_territory_1.territory = stg_apple_store__usage_territory.territory) AND (app_store_territory_1.date_day = stg_apple_store__usage_territory.date_day) AND (app_store_territory_1.app_id = stg_apple_store__usage_territory.app_id) AND (app_store_territory_1.source_type = stg_apple_store__usage_territory.source_type))
-> Sort (cost=9359.64..9392.90 rows=13304 width=108)
Sort Key: app_store_territory_1.territory, app_store_territory_1.date_day, app_store_territory_1.app_id, app_store_territory_1.source_type
-> Hash Right Join (cost=4390.47..8448.34 rows=13304 width=108)
Hash Cond: ((app_store_territory.date_day = app_store_territory_1.date_day) AND (app_store_territory.app_id = app_store_territory_1.app_id) AND (app_store_territory.source_type = app_store_territory_1.source_type) AND (app_store_territory.territory = app_store_territory_1.territory))
-> CTE Scan on app_store_territory (cost=0.00..2660.90 rows=133045 width=108)
-> Hash (cost=4124.39..4124.39 rows=13304 width=76)
-> HashAggregate (cost=3991.35..4124.39 rows=13304 width=76)
Group Key: app_store_territory_1.date_day, app_store_territory_1.app_id, app_store_territory_1.source_type, app_store_territory_1.territory
-> CTE Scan on app_store_territory app_store_territory_1 (cost=0.00..2660.90 rows=133045 width=76)
-> Sort (cost=1098.06..1127.26 rows=11682 width=77)
Sort Key: stg_apple_store__usage_territory.territory, stg_apple_store__usage_territory.date_day, stg_apple_store__usage_territory.app_id, stg_apple_store__usage_territory.source_type
-> Seq Scan on stg_apple_store__usage_territory (cost=0.00..308.82 rows=11682 width=77)
-> Sort (cost=14.96..15.58 rows=250 width=1580)
Sort Key: official_country_codes.country_name
-> CTE Scan on country_codes official_country_codes (cost=0.00..5.00 rows=250 width=1580)
-> Sort (cost=14.96..15.58 rows=250 width=1580)
Sort Key: alternative_country_codes.alternative_country_name
-> CTE Scan on country_codes alternative_country_codes (cost=0.00..5.00 rows=250 width=1580)
-> Hash (cost=18.00..18.00 rows=800 width=40)
-> Seq Scan on stg_apple_store__app (cost=0.00..18.00 rows=800 width=40)
(38 rows)
SLOW
Hash Left Join (cost=32314.14..36206.23 rows=20788 width=1300)
Hash Cond: (app_store_territory_1.territory = (alternative_country_codes.alternative_country_name)::text)
CTE app_store_territory
-> Seq Scan on stg_apple_store__app_store_territory (cost=0.00..3122.45 rows=133045 width=70)
CTE country_codes
-> Seq Scan on apple_store_country_codes (cost=0.00..6.50 rows=250 width=71)
-> Hash Left Join (cost=29177.07..32300.01 rows=16630 width=1300)
Hash Cond: (app_store_territory_1.territory = (official_country_codes.country_name)::text)
-> Merge Left Join (cost=29168.94..31676.58 rows=13304 width=236)
Merge Cond: ((app_store_territory_1.app_id = stg_apple_store__usage_territory.app_id) AND (app_store_territory_1.source_relation = stg_apple_store__usage_territory.source_relation))
Join Filter: ((app_store_territory_1.date_day = stg_apple_store__usage_territory.date_day) AND (app_store_territory_1.source_type = stg_apple_store__usage_territory.source_type) AND (app_store_territory_1.territory = stg_apple_store__usage_territory.territory))
-> Merge Left Join (cost=28070.88..30356.40 rows=13304 width=196)
Merge Cond: ((app_store_territory_1.app_id = stg_apple_store__downloads_territory.app_id) AND (app_store_territory_1.source_relation = stg_apple_store__downloads_territory.source_relation))
Join Filter: ((app_store_territory_1.date_day = stg_apple_store__downloads_territory.date_day) AND (app_store_territory_1.source_type = stg_apple_store__downloads_territory.source_type) AND (app_store_territory_1.territory = stg_apple_store__downloads_territory.territory))
-> Merge Left Join (cost=10131.33..10239.77 rows=13304 width=172)
Merge Cond: ((app_store_territory_1.app_id = stg_apple_store__app.app_id) AND (app_store_territory_1.source_relation = stg_apple_store__app.source_relation))
-> Sort (cost=10074.75..10108.01 rows=13304 width=140)
Sort Key: app_store_territory_1.app_id, app_store_territory_1.source_relation
-> Hash Right Join (cost=4756.34..9163.46 rows=13304 width=140)
Hash Cond: ((app_store_territory.date_day = app_store_territory_1.date_day) AND (app_store_territory.source_relation = app_store_territory_1.source_relation) AND (app_store_territory.app_id = app_store_territory_1.app_id) AND (app_store_territory.source_type = app_store_territory_1.source_type) AND (app_store_territory.territory = app_store_territory_1.territory))
-> CTE Scan on app_store_territory (cost=0.00..2660.90 rows=133045 width=140)
-> Hash (cost=4457.00..4457.00 rows=13304 width=108)
-> HashAggregate (cost=4323.96..4457.00 rows=13304 width=108)
Group Key: app_store_territory_1.source_relation, app_store_territory_1.date_day, app_store_territory_1.app_id, app_store_territory_1.source_type, app_store_territory_1.territory
-> CTE Scan on app_store_territory app_store_territory_1 (cost=0.00..2660.90 rows=133045 width=108)
-> Sort (cost=56.58..58.58 rows=800 width=72)
Sort Key: stg_apple_store__app.app_id, stg_apple_store__app.source_relation
-> Seq Scan on stg_apple_store__app (cost=0.00..18.00 rows=800 width=72)
-> Materialize (cost=17939.56..18606.59 rows=133406 width=62)
-> Sort (cost=17939.56..18273.07 rows=133406 width=62)
Sort Key: stg_apple_store__downloads_territory.app_id, stg_apple_store__downloads_territory.source_relation
-> Seq Scan on stg_apple_store__downloads_territory (cost=0.00..2998.06 rows=133406 width=62)
-> Sort (cost=1098.06..1127.26 rows=11682 width=78)
Sort Key: stg_apple_store__usage_territory.app_id, stg_apple_store__usage_territory.source_relation
-> Seq Scan on stg_apple_store__usage_territory (cost=0.00..308.82 rows=11682 width=78)
-> Hash (cost=5.00..5.00 rows=250 width=1580)
-> CTE Scan on country_codes official_country_codes (cost=0.00..5.00 rows=250 width=1580)
-> Hash (cost=5.00..5.00 rows=250 width=1580)
-> CTE Scan on country_codes alternative_country_codes (cost=0.00..5.00 rows=250 width=1580)
(39 rows)
@johnf Oh wow ok something is going on there. Thank you for confirming that. The only change from v0.3.0 to v0.4.0 is to allow users to union schemas created by multiple connectors together, which is what the source_relation
field is used for. Since you don't need it, I would downgrade in the meantime. For now you can install the below app_reporting
range and remove the reference to the app_store and google_play packages, and it will automatically install the correct downgraded versions.
- package: fivetran/app_reporting
version: [">=0.3.0", "<0.4.0"]
As for the proper fix, let me talk to my team and get some suggestions on what to explore next. Thanks again for your help with this!
Hi @johnf The next thing we'd like to do is try to narrow down what aspect of the udpates is causing the hangup. I have created a test branch that removes the source_relation
join conditions added in v0.4.0 while keeping the other updates the same. Could you run this test branch on its own and see if you are still getting the same hangups? You can install this branch by using the below code in place of the apple_store code in your packages.yml
.
- git: https://github.com/fivetran/dbt_apple_store.git
revision: test/remove-source-relation-joins
warn-unpinned: false
Let me know if you have any questions!
Hi, @fivetran-catfritz, the branch works fine. It took about 4 seconds for the bad query
Very interesting. Thank you @johnf for running the test! It confirms there is definitely something going on with those joins. We'll look into this some more and keep you posted.
Hey @johnf 👋
We've done some investigating here and believe for Postgres the join conditions which result in joining on empty strings (''
) could have some adverse effects on Postgres which is what you're seeing.
We have a potential solution which we've applied in the following package version:
packages:
- git: https://github.com/fivetran/dbt_apple_store.git
revision: feature/union-data-performance-enhancement
warn-unpinned: false
You can see the changes coming from this branch here from the source package and here from the transform.
Please note if you end up testing this out you'll probably want to turn off your app reporting dependency since this will cause a dependency issue. Let me know if you have any problems.
If you want, you can install the above package version and run the following command since I only made the relevant changes to the apple_store__territory_report
upstream models.
dbt run -s +apple_store__territory_report
Let me know if this helps the performance!
@fivetran-joemarkiewicz
I'm just tested this and I think it has the same problem.
This is the query that ran is below. Still running after 10 minutes
FYI If it's possible on your end then happy for you to connect straight to the DB to test. I can give you SSH to my proxy server where I'm testing if that helps
query | /* {"app": "dbt", "dbt_version": "1.7.17", "profile_name": "gladly", "target_name": "prod", "node_id": "model.apple_store.apple_store__territory_report"} */+
| +
| +
| +
| +
| create table "dwh_prod"."dbt_apple_store"."apple_store__territory_report__dbt_tmp" +
| +
| +
| as +
| +
| ( +
| with app as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app" +
| ), +
| +
| app_store_territory as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app_store_territory" +
| ), +
| +
| country_codes as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_source"."apple_store_country_codes" +
| ), +
| +
| downloads_territory as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__downloads_territory" +
| ), +
| +
| usage_territory as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__usage_territory" +
| ), +
| +
| reporting_grain as ( +
| +
| select distinct +
| source_relation, +
| date_day, +
| app_id, +
| source_type, +
| territory +
| from app_store_territory +
| ), +
| +
| joined as ( +
| +
| select +
| reporting_grain.source_relation, +
| reporting_grain.date_day, +
| reporting_grain.app_id, +
| app.app_name, +
| reporting_grain.source_type, +
| reporting_grain.territory as territory_long, +
| coalesce(official_country_codes.country_code_alpha_2, alternative_country_codes.country_code_alpha_2) as territory_short, +
| coalesce(official_country_codes.region, alternative_country_codes.region) as region, +
| coalesce(official_country_codes.sub_region, alternative_country_codes.sub_region) as sub_region, +
| coalesce(app_store_territory.impressions, 0) as impressions, +
| coalesce(app_store_territory.impressions_unique_device, 0) as impressions_unique_device, +
| coalesce(app_store_territory.page_views, 0) as page_views, +
| coalesce(app_store_territory.page_views_unique_device, 0) as page_views_unique_device, +
| coalesce(downloads_territory.first_time_downloads, 0) as first_time_downloads, +
| coalesce(downloads_territory.redownloads, 0) as redownloads, +
| coalesce(downloads_territory.total_downloads, 0) as total_downloads, +
| coalesce(usage_territory.active_devices, 0) as active_devices, +
| coalesce(usage_territory.active_devices_last_30_days, 0) as active_devices_last_30_days, +
| coalesce(usage_territory.deletions, 0) as deletions, +
| coalesce(usage_territory.installations, 0) as installations, +
| coalesce(usage_territory.sessions, 0) as sessions +
| from reporting_grain +
| left join app +
| on reporting_grain.app_id = app.app_id +
| and reporting_grain.source_relation = app.source_relation +
| left join app_store_territory +
| on reporting_grain.date_day = app_store_territory.date_day +
| and reporting_grain.source_relation = app_store_territory.source_relation +
| and reporting_grain.app_id = app_store_territory.app_id +
| and reporting_grain.source_type = app_store_territory.source_type +
| and reporting_grain.territory = app_store_territory.territory +
| left join downloads_territory +
| on reporting_grain.date_day = downloads_territory.date_day +
| and reporting_grain.source_relation = downloads_territory.source_relation +
| and reporting_grain.app_id = downloads_territory.app_id +
| and reporting_grain.source_type = downloads_territory.source_type +
| and reporting_grain.territory = downloads_territory.territory +
| left join usage_territory +
| on reporting_grain.date_day = usage_territory.date_day +
| and reporting_grain.source_relation = usage_territory.source_relation +
| and reporting_grain.app_id = usage_territory.app_id +
| and reporting_grain.source_type = usage_territory.source_type +
| and reporting_grain.territory = usage_territory.territory +
| left join country_codes as official_country_codes +
| on reporting_grain.territory = official_country_codes.country_name +
| left join country_codes as alternative_country_codes +
| on reporting_grain.territory = alternative_country_codes.alternative_country_name +
| ) +
| +
| select * +
| from joined +
| ); +
|
Hey @johnf thanks for sharing and sorry that didn't seem to do the trick.
The above was actually my last ditch effort to see if we could avoid the issues we are seeing with constant expressions causing the query planner in Postgres to explode and take significantly longer than necessary. Since the above didn't work, we are going to plan to move forward with the approach where we remove the join condition only if the union schema feature is not being used. As @fivetran-catfritz shared in the branch above, we can confirm removing the join condition when not necessary will address the query performance issue.
We are going to move forward with making this update to the Apple Store and Google Play models. What we envision this update to be is something along the lines of the following whenever we join on source_relation
.
from reporting_grain
left join app
on reporting_grain.app_id = app.app_id
{{ "and reporting_grain.source_relation = app.source_relation" if var('apple_store_union_schemas',[]) or var('apple_store_union_databases',[]) }}
This way we are excluding the join if the defined variables are empty (not using the union data feature). Ideally, we can consolidate this into a single macro that can be referenced in each model so the code is dryer, but this is what we will likely apply to these models and others going forward.
I see you originally mentioned you were open to contributing a PR to address this issue. Let me know if that is still of interest to you and I'd be happy to help facilitate the updates. Otherwise, we will add this to our upcoming sprint to address. Thanks!
@fivetran-joemarkiewicz sounds great. Let me know if you need any more testing
Is there an existing issue for this?
Describe the issue
Around about the 2nd August I upgraded (all packages and dbt) but in particular this package from 0.3.0 to 0.4.0
Since then the load on my small PostgreSQL instance has gone from averaging at 11% to 90%
What I see is the query below running for hours
The only tables in this query with significant rows are the territory tables but they are circa 124000 rows only.
I'm running PostgreSQL in Azure. I'm no DB expert so haven't been really able to dig into this.
Happy to debug and help as needed or be told I'm doing it wrong :)
Relevant error log or model output
Expected behavior
Query completes within a reasonable time
dbt Project configurations
Package versions
What database are you using dbt with?
postgres
dbt Version
1.7.17 (Note I'm also using dbt transformations, same issue in both places)
Additional Context
No response
Are you willing to open a PR to help address this issue?