fivetran / dbt_hubspot

Data models for Hubspot built using dbt.
https://fivetran.github.io/dbt_hubspot/
Apache License 2.0
33 stars 38 forks source link

[Bug] Poor performance on int_hubspot__contact_merge_adjust #109

Closed kcraig-ats closed 1 year ago

kcraig-ats commented 1 year ago

Is there an existing issue for this?

Describe the issue

With the changes that came in v0.9.0 our project has seen a increase in run times with the HubSpot package. It looks like int_hubspot__contact_merge_adjust is the issue, and specifically the merge_contacts macro that is driving the decrease in performance. We've seen run times range between 15-35 minutes.

Relevant error log or model output

No response

Expected behavior

I don't expect the query to run as long as it does.

dbt Project configurations

vars:
  hubspot_source:
    hubspot_schema: hubspot_fivetran 
  hubspot_email_event_forward_enabled: false
  hubspot_email_event_print_enabled: false
  hubspot_email_event_spam_report_enabled: false
  hubspot_service_enabled: false
  hubspot_contact_property_enabled: false
  hubspot__pass_through_all_columns: true

Package versions

  - package: fivetran/hubspot
    version: [">=0.9.0", "<0.10.0"]

What database are you using dbt with?

redshift

dbt Version

1.3

Additional Context

My guess is the merge_contacts query is suboptimal in redshift. I played around with a few solutions that sped up my run:

The query time averaged a little over a minute with the changes.

Are you willing to open a PR to help address this issue?