fivetran / dbt_apple_store

Fivetran's Apple App Store source dbt package
https://fivetran.github.io/dbt_apple_store/
Apache License 2.0
0 stars 2 forks source link

[Bug] Queries taking too long #23

Open johnf opened 3 weeks ago

johnf commented 3 weeks ago

Is there an existing issue for this?

Describe the issue

Around about the 2nd August I upgraded (all packages and dbt) but in particular this package from 0.3.0 to 0.4.0

Since then the load on my small PostgreSQL instance has gone from averaging at 11% to 90%

image

What I see is the query below running for hours

The only tables in this query with significant rows are the territory tables but they are circa 124000 rows only.

I'm running PostgreSQL in Azure. I'm no DB expert so haven't been really able to dig into this.

Happy to debug and help as needed or be told I'm doing it wrong :)

Relevant error log or model output

CREATE TABLE
  "dwh_prod"."dbt_apple_store"."apple_store__territory_report__dbt_tmp"
AS (
WITH
  app AS (SELECT * FROM "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app"),
  app_store_territory AS (SELECT * FROM "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app_store_territory"),
  country_codes AS (SELECT * FROM "dwh_prod"."dbt_apple_store_source"."apple_store_country_codes"),
  downloads_territory AS (SELECT * FROM "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__downloads_territory"),
  usage_territory AS (SELECT * FROM "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__usage_territory"),
  reporting_grain AS (SELECT DISTINCT source_relation, date_day, app_id, source_type, territory FROM app_store_territory),
  joined AS (
     SELECT
         reporting_grain.source_relation,
         reporting_grain.date_day,
         reporting_grain.app_id,
         app.app_name,
         reporting_grain.source_type,
         reporting_grain.territory AS territory_long,
         coalesce(official_country_codes.country_code_alpha_2, alternative_country_codes.country_code_alpha_2) AS territory_short,
         coalesce(official_country_codes.region, alternative_country_codes.region) AS region,
         coalesce(official_country_codes.sub_region, alternative_country_codes.sub_region) AS sub_region,
         coalesce(app_store_territory.impressions, 0) AS impressions,
         coalesce(app_store_territory.impressions_unique_device, 0) AS impressions_unique_device,
         coalesce(app_store_territory.page_views, 0) AS page_views,
         coalesce(app_store_territory.page_views_unique_device, 0) AS page_views_unique_device,
         coalesce(downloads_territory.first_time_downloads, 0) AS first_time_downloads,
         coalesce(downloads_territory.redownloads, 0) AS redownloads,
         coalesce(downloads_territory.total_downloads, 0) AS total_downloads,
         coalesce(usage_territory.active_devices, 0) AS active_devices,
         coalesce(usage_territory.active_devices_last_30_days, 0) AS active_devices_last_30_days,
         coalesce(usage_territory.deletions, 0) AS deletions,
         coalesce(usage_territory.installations, 0) AS installations,
         coalesce(usage_territory.sessions, 0) AS sessions
     FROM reporting_grain
     LEFT JOIN app
         on reporting_grain.app_id = app.app_id
         AND reporting_grain.source_relation = app.source_relation
     LEFT JOIN app_store_territory
         on reporting_grain.date_day = app_store_territory.date_day
         AND reporting_grain.source_relation = app_store_territory.source_relation
         AND reporting_grain.app_id = app_store_territory.app_id
         AND reporting_grain.source_type = app_store_territory.source_type
         AND reporting_grain.territory = app_store_territory.territory
     LEFT JOIN downloads_territory
         on reporting_grain.date_day = downloads_territory.date_day
         AND reporting_grain.source_relation = downloads_territory.source_relation
         AND reporting_grain.app_id = downloads_territory.app_id
         AND reporting_grain.source_type = downloads_territory.source_type
         AND reporting_grain.territory = downloads_territory.territory
     LEFT JOIN usage_territory
         on reporting_grain.date_day = usage_territory.date_day
         AND reporting_grain.source_relation = usage_territory.source_relation
         AND reporting_grain.app_id = usage_territory.app_id
         AND reporting_grain.source_type = usage_territory.source_type
         AND reporting_grain.territory = usage_territory.territory
     LEFT JOIN country_codes AS official_country_codes
         on reporting_grain.territory = official_country_codes.country_name
     LEFT JOIN country_codes AS alternative_country_codes
         on reporting_grain.territory = alternative_country_codes.alternative_country_name
 )
 SELECT * FROM joined;

Expected behavior

Query completes within a reasonable time

dbt Project configurations

name: 'gladly'
version: '1.0.0'
config-version: 2

profile: 'gladly'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
  - "target"
  - "dbt_packages"

vars:

  # Stripe settings
  stripe_timezone: "Australia/Sydney"
  stripe_schema: stripe_load
  stripe__using_invoices: False
  stripe__using_subscriptions: False
  stripe__using_credit_notes: False
  stripe__using_subscription_history: True

  # Play Store
  google_play_schema: google_play_load

  # App Store
  apple_store_schema: itunes_connect_load

  # Amplitude
  amplitude_schema: amplitude_load

  # Xero
  xero_schema: xero_load

  # Twitter Ads
  twitter_ads_schema: twitter_ads_load
  twitter_ads__using_keywords: False # Our account doesn't use keywords

  # Facebook Ads
  facebook_ads_schema: facebook_ads_load

  # Ad Reporting
  ad_reporting__apple_search_ads_enabled: False
  ad_reporting__pinterest_ads_enabled: False
  ad_reporting__microsoft_ads_enabled: False
  ad_reporting__linkedin_ads_enabled: False
  ad_reporting__google_ads_enabled: False
  ad_reporting__snapchat_ads_enabled: False
  ad_reporting__tiktok_ads_enabled: False
  ad_reporting__amazon_ads_enabled: False
  ad_reporting__reddit_ads_enabled: False

  # Linkedin pages
  linkedin_pages_schema: linkedin_company_pages_load

  # Twitter organic
  twitter_organic_schema: twitter_organic_load

  # Youtube analytics
  youtube_analytics_schema: youtube_analytics_load
  youtube__using_video_metadata: true
  youtube_metadata_schema: youtube_analytics_load
  youtube__channel_basic_table: channel_basic_a_2
  youtube__channel_demographics_table: channel_demographics_a_1

  # Facebook Pages
  facebook_pages_schema: facebook_ads_load

  # Instagram Business
  instragram_business_schema: instragram_business_load

  # Dynamo DB Settings
  dynamodb_schema: dynamodb_load
  dynamodb_suffix: qry_4_ki_4_hvzhm_5_bnv_26_qtcqscaq_prod

models:
  stripe_source:
    +schema: stripe_stg

  google_play_source:
    +schema: google_play_stg

  apple_store_source:
    +schema: apple_store_stg

  amplitude_source:
    +schema: amplitude_stg

  xero_source:
    +schema: xero_stg

  twitter_ads_source:
    +schema: twitter_ads_stg

  twitter_organic_source:
    +schema: twitter_organic_stg

  facebook_ads_source:
    +schema: facebook_ads_stg

  linkedin_pages_source:
    +schema: linkedin_company_pages_stg

  youtube_analytics_source:
    +schema: youtube_analytics_stg

  facebook_pages_source:
    +schema: facebook_pages_stg

  instagram_business_source:
    +schema: instagram_business_stg

  gladly:
    dynamodb:
      +schema: gladly
      +materialized: view
    stats:
      +schema: stats
      +materialized: view

Package versions

packages:
  - package: fivetran/stripe
    version: [">=0.14.0", "<0.15.0"]

  - package: fivetran/google_play
    version: [">=0.4.0", "<0.5.0"]

  - package: fivetran/apple_store
    version: [">=0.4.0", "<0.5.0"]

  - package: fivetran/app_reporting
    version: [">=0.4.0", "<0.5.0"]

  - package: fivetran/amplitude
    version: [">=0.4.0", "<0.5.0"]

  - package: fivetran/xero
    version: [">=0.6.0", "<0.7.0"]

  - package: fivetran/twitter_ads
    version: [">=0.7.0", "<0.8.0"]

  - package: fivetran/facebook_ads
    version: [">=0.7.0", "<0.8.0"]

  - package: fivetran/ad_reporting
    version: [">=1.9.0", "<1.10.0"]

  - package: fivetran/linkedin_pages
    version: [">=0.3.0", "<0.4.0"]

  - package: fivetran/twitter_organic
    version: [">=0.2.0", "<0.3.0"]

  - package: fivetran/youtube_analytics
    version: [">=0.4.0", "<0.5.0"]

  - package: fivetran/facebook_pages
    version: [">=0.3.0", "<0.4.0"]

  - package: fivetran/instagram_business
    version: [">=0.2.0", "<0.3.0"]

  - package: fivetran/social_media_reporting
    version: [">=0.4.0", "<0.5.0"]

What database are you using dbt with?

postgres

dbt Version

1.7.17 (Note I'm also using dbt transformations, same issue in both places)

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-catfritz commented 3 weeks ago

Hi @johnf thanks for reaching out to the team! Could you confirm the following to help me get a sense of what might be going on:

johnf commented 3 weeks ago

Hi @fivetran-catfritz,

I narrowed it down to apple_store because the query I pasted above looks like the problematic one. Right now it's been running for 10 hours. If I kill the query, the load drops back down to what I saw pre upgrade.

I'm not sure if you have any internal stats on the transformations, but if you do I suspect you'll see the apple ones are running all the time on my account (johnf@gladlyapp.com).

I'll pause it now so we can confirm 100%

fivetran-catfritz commented 3 weeks ago

Hi @johnf I reviewed your run logs and noticed the issue you mentioned. Could you try running this package on its own to see if the problem persists?

We've seen similar hangups with Postgres in dbt projects involving multiple packages, often alongside minor updates—like adding a single text field in this case. More puzzling, I've also encountered this issue in another Postgres instance where it occurred in a *_tmp model that was simply a select *.

The issue, however, hasn't appeared consistently enough for us to determine a root cause. I suspect Postgres locks might be a factor, though that’s just a theory for now. Running the dbt_apple_store package in isolation could help clarify if that’s the case. If you're able to access your run logs, you can see the time each model takes. Just let me know if you have any questions!

johnf commented 3 weeks ago

@fivetran-catfritz I can confirm that if I disable the apple_store the load stays at 10% for hours. I'll try enabling only it to see what happens. Also worth noting that when that query is running it is the only one running

johnf commented 3 weeks ago

I've paused all transformations in the dashboard and then commented out all other packages and run them locally.

It's currently hanging on

22:53:15  39 of 44 START sql table model dbt_apple_store.apple_store__territory_report ... [RUN]

I suspect it will sit there for hours again.

FYI, I am happy to jump on a call if someone wants to guide me on running queries manually. This is very reproducible. Also, happy for you to connect to my DB from your end directly.

I kicked it off at 9:30am image

johnf commented 3 weeks ago

With no other queries running, it does finish eventually but takes 80 minutes.

23:29:28  39 of 44 START sql table model dbt_apple_store.apple_store__territory_report ... [RUN]
00:49:08  39 of 44 OK created sql table model dbt_apple_store.apple_store__territory_report  [SELECT 130595 in 4780.70s]
00:49:08  40 of 44 START sql table model dbt_apple_store.apple_store__app_version_report . [RUN]
01:21:40  40 of 44 OK created sql table model dbt_apple_store.apple_store__app_version_report  [SELECT 261984 in 1951.45s]
01:21:40  41 of 44 START sql table model dbt_apple_store.apple_store__device_report ...... [RUN]
01:22:41  41 of 44 OK created sql table model dbt_apple_store.apple_store__device_report . [SELECT 18174 in 61.07s]
01:22:41  42 of 44 START sql table model dbt_apple_store.apple_store__platform_version_report  [RUN]

It isn't a super powerful instance, it is an Azure Burstable, B1ms, 1 vCores, 2 GiB RAM, 32 GiB storage but this seems like a long time for not that much data.

fivetran-catfritz commented 2 weeks ago

Hi @johnf Thank you for running that! 80 mins does seem a bit long for the size. Before we dive deeper, could you check one more thing? If you downgrade to the prior version of dbt_apple_store and run that independently, what is the run time like? I think it would help to have a couple baselines.

johnf commented 2 weeks ago

@fivetran-catfritz OK good news, it's super fast on the old version, only 3 seconds.

21:50:16  39 of 44 START sql table model dbt_apple_store.apple_store__territory_report ... [RUN]
21:50:19  39 of 44 OK created sql table model dbt_apple_store.apple_store__territory_report  [SELECT 133045 in 3.33s]

For reference this is the diff between the queries

--- before  2024-08-28 07:54:58.434667925 +1000
+++ after   2024-08-28 07:54:59.798667716 +1000
@@ -23,6 +23,7 @@
  ),
  reporting_grain as (
      select distinct
+         source_relation,
          date_day,
          app_id,
          source_type,
@@ -31,6 +32,7 @@
  ),
  joined as (
      select
+         reporting_grain.source_relation,
          reporting_grain.date_day,
          reporting_grain.app_id,
          app.app_name,
@@ -54,18 +56,22 @@
      from reporting_grain
      left join app
          on reporting_grain.app_id = app.app_id
+         and reporting_grain.source_relation = app.source_relation
      left join app_store_territory
          on reporting_grain.date_day = app_store_territory.date_day
+         and reporting_grain.source_relation = app_store_territory.source_relation
          and reporting_grain.app_id = app_store_territory.app_id
          and reporting_grain.source_type = app_store_territory.source_type
          and reporting_grain.territory = app_store_territory.territory
      left join downloads_territory
          on reporting_grain.date_day = downloads_territory.date_day
+         and reporting_grain.source_relation = downloads_territory.source_relation
          and reporting_grain.app_id = downloads_territory.app_id
          and reporting_grain.source_type = downloads_territory.source_type
          and reporting_grain.territory = downloads_territory.territory
      left join usage_territory
          on reporting_grain.date_day = usage_territory.date_day
+         and reporting_grain.source_relation = usage_territory.source_relation
          and reporting_grain.app_id = usage_territory.app_id
          and reporting_grain.source_type = usage_territory.source_type
          and reporting_grain.territory = usage_territory.territory

I also ran explain on both querys, but t's greek to me :grin:

FAST

 Hash Left Join  (cost=31584.11..37422.28 rows=83150 width=1268)
   Hash Cond: (app_store_territory_1.app_id = stg_apple_store__app.app_id)
   CTE app_store_territory
     ->  Seq Scan on stg_apple_store__app_store_territory  (cost=0.00..3122.45 rows=133045 width=70)
   CTE country_codes
     ->  Seq Scan on apple_store_country_codes  (cost=0.00..6.50 rows=250 width=71)
   ->  Merge Right Join  (cost=28427.16..31303.04 rows=20788 width=2300)
         Merge Cond: ((stg_apple_store__downloads_territory.territory = app_store_territory_1.territory) AND (stg_apple_store__downloads_territory.date_day = app_store_territory_1.date_day) AND (stg_apple_store__downloads_territory.app_id = app_store_territory_1.app_id) AND (stg_apple_store__downloads_territory.source_type = app_store_territory_1.source_type))
         ->  Sort  (cost=17939.56..18273.07 rows=133406 width=61)
               Sort Key: stg_apple_store__downloads_territory.territory, stg_apple_store__downloads_territory.date_day, stg_apple_store__downloads_territory.app_id, stg_apple_store__downloads_territory.source_type
               ->  Seq Scan on stg_apple_store__downloads_territory  (cost=0.00..2998.06 rows=133406 width=61)
         ->  Materialize  (cost=10487.61..11488.02 rows=20788 width=2276)
               ->  Merge Left Join  (cost=10487.61..11436.05 rows=20788 width=2276)
                     Merge Cond: (app_store_territory_1.territory = (alternative_country_codes.alternative_country_name)::text)
                     ->  Merge Left Join  (cost=10472.65..11067.70 rows=16630 width=1212)
                           Merge Cond: (app_store_territory_1.territory = (official_country_codes.country_name)::text)
                           ->  Merge Left Join  (cost=10457.69..10770.03 rows=13304 width=148)
                                 Merge Cond: ((app_store_territory_1.territory = stg_apple_store__usage_territory.territory) AND (app_store_territory_1.date_day = stg_apple_store__usage_territory.date_day) AND (app_store_territory_1.app_id = stg_apple_store__usage_territory.app_id) AND (app_store_territory_1.source_type = stg_apple_store__usage_territory.source_type))
                                 ->  Sort  (cost=9359.64..9392.90 rows=13304 width=108)
                                       Sort Key: app_store_territory_1.territory, app_store_territory_1.date_day, app_store_territory_1.app_id, app_store_territory_1.source_type
                                       ->  Hash Right Join  (cost=4390.47..8448.34 rows=13304 width=108)
                                             Hash Cond: ((app_store_territory.date_day = app_store_territory_1.date_day) AND (app_store_territory.app_id = app_store_territory_1.app_id) AND (app_store_territory.source_type = app_store_territory_1.source_type) AND (app_store_territory.territory = app_store_territory_1.territory))
                                             ->  CTE Scan on app_store_territory  (cost=0.00..2660.90 rows=133045 width=108)
                                             ->  Hash  (cost=4124.39..4124.39 rows=13304 width=76)
                                                   ->  HashAggregate  (cost=3991.35..4124.39 rows=13304 width=76)
                                                         Group Key: app_store_territory_1.date_day, app_store_territory_1.app_id, app_store_territory_1.source_type, app_store_territory_1.territory
                                                         ->  CTE Scan on app_store_territory app_store_territory_1  (cost=0.00..2660.90 rows=133045 width=76)
                                 ->  Sort  (cost=1098.06..1127.26 rows=11682 width=77)
                                       Sort Key: stg_apple_store__usage_territory.territory, stg_apple_store__usage_territory.date_day, stg_apple_store__usage_territory.app_id, stg_apple_store__usage_territory.source_type
                                       ->  Seq Scan on stg_apple_store__usage_territory  (cost=0.00..308.82 rows=11682 width=77)
                           ->  Sort  (cost=14.96..15.58 rows=250 width=1580)
                                 Sort Key: official_country_codes.country_name
                                 ->  CTE Scan on country_codes official_country_codes  (cost=0.00..5.00 rows=250 width=1580)
                     ->  Sort  (cost=14.96..15.58 rows=250 width=1580)
                           Sort Key: alternative_country_codes.alternative_country_name
                           ->  CTE Scan on country_codes alternative_country_codes  (cost=0.00..5.00 rows=250 width=1580)
   ->  Hash  (cost=18.00..18.00 rows=800 width=40)
         ->  Seq Scan on stg_apple_store__app  (cost=0.00..18.00 rows=800 width=40)
(38 rows)

SLOW

 Hash Left Join  (cost=32314.14..36206.23 rows=20788 width=1300)
   Hash Cond: (app_store_territory_1.territory = (alternative_country_codes.alternative_country_name)::text)
   CTE app_store_territory
     ->  Seq Scan on stg_apple_store__app_store_territory  (cost=0.00..3122.45 rows=133045 width=70)
   CTE country_codes
     ->  Seq Scan on apple_store_country_codes  (cost=0.00..6.50 rows=250 width=71)
   ->  Hash Left Join  (cost=29177.07..32300.01 rows=16630 width=1300)
         Hash Cond: (app_store_territory_1.territory = (official_country_codes.country_name)::text)
         ->  Merge Left Join  (cost=29168.94..31676.58 rows=13304 width=236)
               Merge Cond: ((app_store_territory_1.app_id = stg_apple_store__usage_territory.app_id) AND (app_store_territory_1.source_relation = stg_apple_store__usage_territory.source_relation))
               Join Filter: ((app_store_territory_1.date_day = stg_apple_store__usage_territory.date_day) AND (app_store_territory_1.source_type = stg_apple_store__usage_territory.source_type) AND (app_store_territory_1.territory = stg_apple_store__usage_territory.territory))
               ->  Merge Left Join  (cost=28070.88..30356.40 rows=13304 width=196)
                     Merge Cond: ((app_store_territory_1.app_id = stg_apple_store__downloads_territory.app_id) AND (app_store_territory_1.source_relation = stg_apple_store__downloads_territory.source_relation))
                     Join Filter: ((app_store_territory_1.date_day = stg_apple_store__downloads_territory.date_day) AND (app_store_territory_1.source_type = stg_apple_store__downloads_territory.source_type) AND (app_store_territory_1.territory = stg_apple_store__downloads_territory.territory))
                     ->  Merge Left Join  (cost=10131.33..10239.77 rows=13304 width=172)
                           Merge Cond: ((app_store_territory_1.app_id = stg_apple_store__app.app_id) AND (app_store_territory_1.source_relation = stg_apple_store__app.source_relation))
                           ->  Sort  (cost=10074.75..10108.01 rows=13304 width=140)
                                 Sort Key: app_store_territory_1.app_id, app_store_territory_1.source_relation
                                 ->  Hash Right Join  (cost=4756.34..9163.46 rows=13304 width=140)
                                       Hash Cond: ((app_store_territory.date_day = app_store_territory_1.date_day) AND (app_store_territory.source_relation = app_store_territory_1.source_relation) AND (app_store_territory.app_id = app_store_territory_1.app_id) AND (app_store_territory.source_type = app_store_territory_1.source_type) AND (app_store_territory.territory = app_store_territory_1.territory))
                                       ->  CTE Scan on app_store_territory  (cost=0.00..2660.90 rows=133045 width=140)
                                       ->  Hash  (cost=4457.00..4457.00 rows=13304 width=108)
                                             ->  HashAggregate  (cost=4323.96..4457.00 rows=13304 width=108)
                                                   Group Key: app_store_territory_1.source_relation, app_store_territory_1.date_day, app_store_territory_1.app_id, app_store_territory_1.source_type, app_store_territory_1.territory
                                                   ->  CTE Scan on app_store_territory app_store_territory_1  (cost=0.00..2660.90 rows=133045 width=108)
                           ->  Sort  (cost=56.58..58.58 rows=800 width=72)
                                 Sort Key: stg_apple_store__app.app_id, stg_apple_store__app.source_relation
                                 ->  Seq Scan on stg_apple_store__app  (cost=0.00..18.00 rows=800 width=72)
                     ->  Materialize  (cost=17939.56..18606.59 rows=133406 width=62)
                           ->  Sort  (cost=17939.56..18273.07 rows=133406 width=62)
                                 Sort Key: stg_apple_store__downloads_territory.app_id, stg_apple_store__downloads_territory.source_relation
                                 ->  Seq Scan on stg_apple_store__downloads_territory  (cost=0.00..2998.06 rows=133406 width=62)
               ->  Sort  (cost=1098.06..1127.26 rows=11682 width=78)
                     Sort Key: stg_apple_store__usage_territory.app_id, stg_apple_store__usage_territory.source_relation
                     ->  Seq Scan on stg_apple_store__usage_territory  (cost=0.00..308.82 rows=11682 width=78)
         ->  Hash  (cost=5.00..5.00 rows=250 width=1580)
               ->  CTE Scan on country_codes official_country_codes  (cost=0.00..5.00 rows=250 width=1580)
   ->  Hash  (cost=5.00..5.00 rows=250 width=1580)
         ->  CTE Scan on country_codes alternative_country_codes  (cost=0.00..5.00 rows=250 width=1580)
(39 rows)
fivetran-catfritz commented 2 weeks ago

@johnf Oh wow ok something is going on there. Thank you for confirming that. The only change from v0.3.0 to v0.4.0 is to allow users to union schemas created by multiple connectors together, which is what the source_relation field is used for. Since you don't need it, I would downgrade in the meantime. For now you can install the below app_reporting range and remove the reference to the app_store and google_play packages, and it will automatically install the correct downgraded versions.

  - package: fivetran/app_reporting
    version: [">=0.3.0", "<0.4.0"]

As for the proper fix, let me talk to my team and get some suggestions on what to explore next. Thanks again for your help with this!

fivetran-catfritz commented 2 weeks ago

Hi @johnf The next thing we'd like to do is try to narrow down what aspect of the udpates is causing the hangup. I have created a test branch that removes the source_relation join conditions added in v0.4.0 while keeping the other updates the same. Could you run this test branch on its own and see if you are still getting the same hangups? You can install this branch by using the below code in place of the apple_store code in your packages.yml.

- git: https://github.com/fivetran/dbt_apple_store.git
  revision: test/remove-source-relation-joins
  warn-unpinned: false

Let me know if you have any questions!

johnf commented 2 weeks ago

Hi, @fivetran-catfritz, the branch works fine. It took about 4 seconds for the bad query

fivetran-catfritz commented 2 weeks ago

Very interesting. Thank you @johnf for running the test! It confirms there is definitely something going on with those joins. We'll look into this some more and keep you posted.

fivetran-joemarkiewicz commented 1 week ago

Hey @johnf 👋

We've done some investigating here and believe for Postgres the join conditions which result in joining on empty strings ('') could have some adverse effects on Postgres which is what you're seeing.

We have a potential solution which we've applied in the following package version:

packages:
  - git: https://github.com/fivetran/dbt_apple_store.git
    revision: feature/union-data-performance-enhancement
    warn-unpinned: false 

You can see the changes coming from this branch here from the source package and here from the transform.

Please note if you end up testing this out you'll probably want to turn off your app reporting dependency since this will cause a dependency issue. Let me know if you have any problems.

If you want, you can install the above package version and run the following command since I only made the relevant changes to the apple_store__territory_report upstream models.

dbt run -s +apple_store__territory_report

Let me know if this helps the performance!

johnf commented 1 week ago

@fivetran-joemarkiewicz

I'm just tested this and I think it has the same problem.

This is the query that ran is below. Still running after 10 minutes

FYI If it's possible on your end then happy for you to connect straight to the DB to test. I can give you SSH to my proxy server where I'm testing if that helps

query         | /* {"app": "dbt", "dbt_version": "1.7.17", "profile_name": "gladly", "target_name": "prod", "node_id": "model.apple_store.apple_store__territory_report"} */+
              |                                                                                                                                                             +
              |                                                                                                                                                             +
              |                                                                                                                                                             +
              |                                                                                                                                                             +
              |   create  table "dwh_prod"."dbt_apple_store"."apple_store__territory_report__dbt_tmp"                                                                       +
              |                                                                                                                                                             +
              |                                                                                                                                                             +
              |     as                                                                                                                                                      +
              |                                                                                                                                                             +
              |   (                                                                                                                                                         +
              |     with app as (                                                                                                                                           +
              |                                                                                                                                                             +
              |     select *                                                                                                                                                +
              |     from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app"                                                                                            +
              | ),                                                                                                                                                          +
              |                                                                                                                                                             +
              | app_store_territory as (                                                                                                                                    +
              |                                                                                                                                                             +
              |     select *                                                                                                                                                +
              |     from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app_store_territory"                                                                            +
              | ),                                                                                                                                                          +
              |                                                                                                                                                             +
              | country_codes as (                                                                                                                                          +
              |                                                                                                                                                             +
              |     select *                                                                                                                                                +
              |     from "dwh_prod"."dbt_apple_store_source"."apple_store_country_codes"                                                                                    +
              | ),                                                                                                                                                          +
              |                                                                                                                                                             +
              | downloads_territory as (                                                                                                                                    +
              |                                                                                                                                                             +
              |     select *                                                                                                                                                +
              |     from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__downloads_territory"                                                                            +
              | ),                                                                                                                                                          +
              |                                                                                                                                                             +
              | usage_territory as (                                                                                                                                        +
              |                                                                                                                                                             +
              |     select *                                                                                                                                                +
              |     from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__usage_territory"                                                                                +
              | ),                                                                                                                                                          +
              |                                                                                                                                                             +
              | reporting_grain as (                                                                                                                                        +
              |                                                                                                                                                             +
              |     select distinct                                                                                                                                         +
              |         source_relation,                                                                                                                                    +
              |         date_day,                                                                                                                                           +
              |         app_id,                                                                                                                                             +
              |         source_type,                                                                                                                                        +
              |         territory                                                                                                                                           +
              |     from app_store_territory                                                                                                                                +
              | ),                                                                                                                                                          +
              |                                                                                                                                                             +
              | joined as (                                                                                                                                                 +
              |                                                                                                                                                             +
              |     select                                                                                                                                                  +
              |         reporting_grain.source_relation,                                                                                                                    +
              |         reporting_grain.date_day,                                                                                                                           +
              |         reporting_grain.app_id,                                                                                                                             +
              |         app.app_name,                                                                                                                                       +
              |         reporting_grain.source_type,                                                                                                                        +
              |         reporting_grain.territory as territory_long,                                                                                                        +
              |         coalesce(official_country_codes.country_code_alpha_2, alternative_country_codes.country_code_alpha_2) as territory_short,                           +
              |         coalesce(official_country_codes.region, alternative_country_codes.region) as region,                                                                +
              |         coalesce(official_country_codes.sub_region, alternative_country_codes.sub_region) as sub_region,                                                    +
              |         coalesce(app_store_territory.impressions, 0) as impressions,                                                                                        +
              |         coalesce(app_store_territory.impressions_unique_device, 0) as impressions_unique_device,                                                            +
              |         coalesce(app_store_territory.page_views, 0) as page_views,                                                                                          +
              |         coalesce(app_store_territory.page_views_unique_device, 0) as page_views_unique_device,                                                              +
              |         coalesce(downloads_territory.first_time_downloads, 0) as first_time_downloads,                                                                      +
              |         coalesce(downloads_territory.redownloads, 0) as redownloads,                                                                                        +
              |         coalesce(downloads_territory.total_downloads, 0) as total_downloads,                                                                                +
              |         coalesce(usage_territory.active_devices, 0) as active_devices,                                                                                      +
              |         coalesce(usage_territory.active_devices_last_30_days, 0) as active_devices_last_30_days,                                                            +
              |         coalesce(usage_territory.deletions, 0) as deletions,                                                                                                +
              |         coalesce(usage_territory.installations, 0) as installations,                                                                                        +
              |         coalesce(usage_territory.sessions, 0) as sessions                                                                                                   +
              |     from reporting_grain                                                                                                                                    +
              |     left join app                                                                                                                                           +
              |         on reporting_grain.app_id = app.app_id                                                                                                              +
              |         and reporting_grain.source_relation = app.source_relation                                                                                           +
              |     left join app_store_territory                                                                                                                           +
              |         on reporting_grain.date_day = app_store_territory.date_day                                                                                          +
              |         and reporting_grain.source_relation = app_store_territory.source_relation                                                                           +
              |         and reporting_grain.app_id = app_store_territory.app_id                                                                                             +
              |         and reporting_grain.source_type = app_store_territory.source_type                                                                                   +
              |         and reporting_grain.territory = app_store_territory.territory                                                                                       +
              |     left join downloads_territory                                                                                                                           +
              |         on reporting_grain.date_day = downloads_territory.date_day                                                                                          +
              |         and reporting_grain.source_relation = downloads_territory.source_relation                                                                           +
              |         and reporting_grain.app_id = downloads_territory.app_id                                                                                             +
              |         and reporting_grain.source_type = downloads_territory.source_type                                                                                   +
              |         and reporting_grain.territory = downloads_territory.territory                                                                                       +
              |     left join usage_territory                                                                                                                               +
              |         on reporting_grain.date_day = usage_territory.date_day                                                                                              +
              |         and reporting_grain.source_relation = usage_territory.source_relation                                                                               +
              |         and reporting_grain.app_id = usage_territory.app_id                                                                                                 +
              |         and reporting_grain.source_type = usage_territory.source_type                                                                                       +
              |         and reporting_grain.territory = usage_territory.territory                                                                                           +
              |     left join country_codes as official_country_codes                                                                                                       +
              |         on reporting_grain.territory = official_country_codes.country_name                                                                                  +
              |     left join country_codes as alternative_country_codes                                                                                                    +
              |         on reporting_grain.territory = alternative_country_codes.alternative_country_name                                                                   +
              | )                                                                                                                                                           +
              |                                                                                                                                                             +
              | select *                                                                                                                                                    +
              | from joined                                                                                                                                                 +
              |   );                                                                                                                                                        +
              |   
fivetran-joemarkiewicz commented 5 days ago

Hey @johnf thanks for sharing and sorry that didn't seem to do the trick.

The above was actually my last ditch effort to see if we could avoid the issues we are seeing with constant expressions causing the query planner in Postgres to explode and take significantly longer than necessary. Since the above didn't work, we are going to plan to move forward with the approach where we remove the join condition only if the union schema feature is not being used. As @fivetran-catfritz shared in the branch above, we can confirm removing the join condition when not necessary will address the query performance issue.

We are going to move forward with making this update to the Apple Store and Google Play models. What we envision this update to be is something along the lines of the following whenever we join on source_relation.

from reporting_grain 
left join app                
      on reporting_grain.app_id = app.app_id               
      {{ "and reporting_grain.source_relation = app.source_relation" if var('apple_store_union_schemas',[]) or var('apple_store_union_databases',[]) }}

This way we are excluding the join if the defined variables are empty (not using the union data feature). Ideally, we can consolidate this into a single macro that can be referenced in each model so the code is dryer, but this is what we will likely apply to these models and others going forward.

I see you originally mentioned you were open to contributing a PR to address this issue. Let me know if that is still of interest to you and I'd be happy to help facilitate the updates. Otherwise, we will add this to our upcoming sprint to address. Thanks!

johnf commented 3 days ago

@fivetran-joemarkiewicz sounds great. Let me know if you need any more testing