fivetran / dbt_hubspot_source

Data models for Hubspot built using dbt.
https://fivetran.github.io/dbt_hubspot_source/
Apache License 2.0
31 stars 30 forks source link

UNSUBSCRIBES does not exist #60

Closed ascendmgt closed 2 years ago

ascendmgt commented 2 years ago

Is there an existing issue for this?

Describe the issue

I have 3 HubSpot accounts and using Fivetran to load the data into Snowflake (everything working fine) and then installed this package to transform the data so I can use it in Tableau.

Of the three accounts, two work the same (perfectly) but one has errors.

If I look at the created views there is a difference which probably means that the accounts are created/setup differently in HubSpot but I won't be able to change that I have to change dbt to cope.

Working view definition: create or replace TRANSIENT TABLE DEV.CLOUD_DEV_SCHEMA_HUBSPOT_IST.HUBSPOT__EMAIL_SENDS ( EMAIL_SEND_ID VARCHAR(256), EVENT_ID VARCHAR(256), _FIVETRAN_SYNCED TIMESTAMP_TZ(9), BCC_EMAILS VARIANT, CC_EMAILS VARIANT, FROM_EMAIL VARCHAR(256), REPLY_TO_EMAIL VARIANT, EMAIL_SUBJECT VARCHAR(256), CREATED_TIMESTAMP TIMESTAMP_TZ(9), EMAIL_CAMPAIGN_ID NUMBER(38,0), RECIPIENT_EMAIL_ADDRESS VARCHAR(256), EMAIL_SEND_TIMESTAMP TIMESTAMP_TZ(9), CONTACT_ID NUMBER(38,0), BOUNCES NUMBER(18,0), CLICKS NUMBER(18,0), DEFERRALS NUMBER(18,0), DELIVERIES NUMBER(18,0), DROPS NUMBER(18,0), FORWARDS NUMBER(18,0), OPENS NUMBER(18,0), PRINTS NUMBER(18,0), SPAM_REPORTS NUMBER(18,0), WAS_BOUNCED BOOLEAN, WAS_CLICKED BOOLEAN, WAS_DEFERRED BOOLEAN, WAS_DELIVERED BOOLEAN, WAS_FORWARDED BOOLEAN, WAS_OPENED BOOLEAN, WAS_PRINTED BOOLEAN, WAS_SPAM_REPORTED BOOLEAN, UNSUBSCRIBES NUMBER(18,0), WAS_UNSUBCRIBED BOOLEAN );

NOT working view definition:

create or replace TRANSIENT TABLE DEV.DBT_CMCCLELLAN_HUBSPOT.HUBSPOT__EMAIL_SENDS ( EMAIL_SEND_ID VARCHAR(256), EVENT_ID VARCHAR(256), _FIVETRAN_SYNCED TIMESTAMP_TZ(9), BCC_EMAILS VARIANT, CC_EMAILS VARIANT, FROM_EMAIL VARCHAR(256), REPLY_TO_EMAIL VARIANT, EMAIL_SUBJECT VARCHAR(256), CREATED_TIMESTAMP TIMESTAMP_TZ(9), EMAIL_CAMPAIGN_ID NUMBER(38,0), RECIPIENT_EMAIL_ADDRESS VARCHAR(256), EMAIL_SEND_TIMESTAMP TIMESTAMP_TZ(9), CONTACT_ID NUMBER(38,0), BOUNCES NUMBER(18,0), CLICKS NUMBER(18,0), DEFERRALS NUMBER(18,0), DELIVERIES NUMBER(18,0), DROPS NUMBER(18,0), FORWARDS NUMBER(18,0), OPENS NUMBER(18,0), PRINTS NUMBER(18,0), SPAM_REPORTS NUMBER(18,0), WAS_BOUNCED BOOLEAN, WAS_CLICKED BOOLEAN, WAS_DEFERRED BOOLEAN, WAS_DELIVERED BOOLEAN, WAS_FORWARDED BOOLEAN, WAS_OPENED BOOLEAN, WAS_PRINTED BOOLEAN, WAS_SPAM_REPORTED BOOLEAN );

The working view has the last 2 columns (UNSUBSCBRIBES and WAS_UNSUBSCRIBED) and the NOT working one doesn't

Relevant error log or model output

Working view definition:

2022-04-17T02:08:50.684915Z: 02:08:50  Opening a new connection, currently in state closed
2022-04-17T02:08:52.129232Z: 02:08:52  Snowflake adapter: Snowflake query id: 01a3a860-3200-db59-0000-b28d0003a51a
2022-04-17T02:08:52.129359Z: 02:08:52  Snowflake adapter: Snowflake error: 000904 (42000): SQL compilation error: error line 56 at position 12
invalid identifier 'EMAIL_SENDS.UNSUBSCRIBES'
2022-04-17T02:08:52.129509Z: 02:08:52  finished collecting timing info
2022-04-17T02:08:52.129625Z: 02:08:52  On model.hubspot.hubspot__email_campaigns: Close
2022-04-17T02:08:52.596406Z: 02:08:52  Database Error in model hubspot__email_campaigns (models/marketing/hubspot__email_campaigns.sql)
  000904 (42000): SQL compilation error: error line 56 at position 12
  invalid identifier 'EMAIL_SENDS.UNSUBSCRIBES'
  compiled SQL at target/run/hubspot/models/marketing/hubspot__email_campaigns.sql
2022-04-17T02:08:52.596623Z: 02:08:52  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '9e6c0a19-5e4a-42b0-a4ed-ace800c91180', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f5858410550>]}
2022-04-17T02:08:52.596923Z: 02:08:52  77 of 78 ERROR creating table model dbt_cmcclellan_hubspot.hubspot__email_campaigns [ERROR in 1.97s]
2022-04-17T02:08:52.597039Z: 02:08:52  Finished running node model.hubspot.hubspot__email_campaigns
Database Error in model hubspot__email_campaigns (models/marketing/hubspot__email_campaigns.sql)
  000904 (42000): SQL compilation error: error line 56 at position 12
  invalid identifier 'EMAIL_SENDS.UNSUBSCRIBES'
  compiled SQL at target/run/hubspot/models/marketing/hubspot__email_campaigns.sql

Expected behavior

I expect this package to handle the data when the unsubscribe columns are missing

dbt Project configurations

I'm using dbt Cloud, my dbt_project_yml in the root folder is :

`

Name your project! Project names should contain only lowercase characters

and underscores. A good package name should reflect your organization's

name or the intended use of these models

name: 'hubspot_CHANGETHIS' version: '1.0.0' config-version: 2

This setting configures which "profile" dbt uses for this project.

profile: 'default'

These configurations specify where dbt should look for different types of files.

The source-paths config, for example, states that models in this project can be

found in the "models/" directory. You probably won't need to change these!

model-paths: ["models"] analysis-paths: ["analyses"] test-paths: ["tests"] seed-paths: ["seeds"] macro-paths: ["macros"] snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files clean-targets: # directories to be removed by dbt clean

Configuring models

Full documentation: https://docs.getdbt.com/docs/configuring-models

In this example config, we tell dbt to build all models in the example/ directory

as tables. These settings can be overridden in the individual model files

using the {{ config(...) }} macro.

models: hubspot_CHANGETHIS:

Applies to all files under models/example/

example:
  materialized: view

vars: hubspot_source: hubspot_database: CHANGETHIS hubspot_schema: hubspot_CHANGETHIS

Marketing

hubspot_contact_list_enabled: false # Disables contact list models hubspot_contact_list_member_enabled: false # Disables contact list member models hubspot_email_event_forward_enabled: false hubspot_email_event_print_enabled: false hubspot_email_event_spam_report_enabled: false hubspot_email_event_status_change_enabled: false hubspot_engagement_deal_enabled: false`

Package versions

`packages:

What database are you using dbt with?

snowflake

dbt Version

dbt Cloud dbt --version doesn't work but I can see on the menu it's 1.1.45.258

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @ascendmgt thank you for opening this issue and for raising this with our team!

Taking a look at the error you are providing, I wonder if the variables you defined need to be adjusted. It looks like the yml formatting was removed in the creation of this issue, but I can see you are setting the variable hubspot_email_event_status_change_enabled as false in your hubspot_source variable hierarchy.

This in fact disables the relevant models in the hubspot_source package that handle unsubscribe and other email change events. However, while this disables the models properly in the source package, it is not disabling the proper downstream models. You can see the in the below snippet where this variable is called in the downstream hubspot package.

https://github.com/fivetran/dbt_hubspot/blob/main/models/marketing/hubspot__email_sends.sql#L43-L44

{% if fivetran_utils.enabled_vars(['hubspot_email_event_status_change_enabled']) %}

), unsubscribes as (

    select *
    from {{ ref('int_hubspot__email_aggregate_status_change') }}

This cte is where the unsubscribes field is being generated. You can see we are checking the hubspot_email_event_status_change_enabled variable to see if this part of the query should be executed or not. Since you have the variables set to false, I am wondering why this is being executed.

My thought to account for this is to change the hierarchy of your hubspot package variables. Currently, you have them listed under the hubspot_source hierarchy. This in fact limits the variables to only the scope of the hubspot_source package and not the hubspot package. Therefore, you can make your variables global instead and see that this issue should no longer persist. I would recommend using the below format for your variables in your dbt_project.yml:

vars:
  hubspot_database: CHANGETHIS
  hubspot_schema: hubspot_CHANGETHIS

  hubspot_contact_list_enabled: false # Disables contact list models
  hubspot_contact_list_member_enabled: false # Disables contact list member models
  hubspot_email_event_forward_enabled: false
  hubspot_email_event_print_enabled: false
  hubspot_email_event_spam_report_enabled: false
  hubspot_email_event_status_change_enabled: false
  hubspot_engagement_deal_enabled: false`

You will notice I removed the hubspot_source: hierarchy and adjusted the variables to be globally defined. Let me know if this helps or if you have any other questions!

ascendmgt commented 2 years ago

Thanks for the tips and I've applied those changes, it's actually failing here:

https://github.com/fivetran/dbt_hubspot/blob/main/models/marketing/hubspot__email_campaigns.sql#L18-L22

because the unsubscribes field doesn't exist in email_sends

sum(email_sends.unsubscribes) as total_unsubscribes,

For the time being I've added

hubspot_email_event_enabled: false

to the dbt_project.yml and gives me a full clean/green run through all the models

ascendmgt commented 2 years ago

I'm happy to close this issue, but I just wanted to check if you wanted to confirm anything before closing?

fivetran-joemarkiewicz commented 2 years ago

Hi @ascendmgt thanks for the clarification! If that is the case, then you should just be able to adjust the email_events variable to not include unsubscribes. You can see how to do that here.

You can see that Unsubscribes is included by default. Therefore, your model is failing since that field doesn't exist in your base tables. However, you can simply remove the metrics that don't exist in your base tables and you should see the model succeed without having to disable it!

vars:
  hubspot_database: CHANGETHIS
  hubspot_schema: hubspot_CHANGETHIS

  hubspot_contact_list_enabled: false # Disables contact list models
  hubspot_contact_list_member_enabled: false # Disables contact list member models
  hubspot_email_event_forward_enabled: false
  hubspot_email_event_print_enabled: false
  hubspot_email_event_spam_report_enabled: false
  hubspot_email_event_status_change_enabled: false
  hubspot_engagement_deal_enabled: false

  email_metrics: ['bounces',      #Remove if you do not want metrics in final model.
                  'clicks',       #Remove if you do not want metrics in final model.
                  'deferrals',    #Remove if you do not want metrics in final model.
                  'deliveries',   #Remove if you do not want metrics in final model.
                  'drops',        #Remove if you do not want metrics in final model.
                  'forwards',     #Remove if you do not want metrics in final model.
                  'opens',        #Remove if you do not want metrics in final model.
                  'prints',       #Remove if you do not want metrics in final model.
                  'spam_reports' #Remove if you do not want metrics in final model.
                  ]
fivetran-joemarkiewicz commented 2 years ago

@ascendmgt I am going to close this issue since I believe the above comment will help resolve the issue you are seeing. Please feel free to re-open the issue if you believe it is not resolved. Likewise, feel free to open a new issue or feature request to discuss any other topics on this package 😄