fivetran / dbt_hubspot_source

Data models for Hubspot built using dbt.
https://fivetran.github.io/dbt_hubspot_source/
Apache License 2.0
31 stars 30 forks source link

[Bug] Hubspot Deal Transformation Data Type Error #107

Closed Brideau closed 1 year ago

Brideau commented 1 year ago

Is there an existing issue for this?

Describe the issue

The transformation logic for hubspot__deals expects deal_pipeline_id and deal_pipeline_stage_id to be strings, but sometimes the source has them as integers. This causes these joins in the transformation:

left join pipelines 
    on deals.deal_pipeline_id = pipelines.deal_pipeline_id
left join pipeline_stages 
    on deals.deal_pipeline_stage_id = pipeline_stages.deal_pipeline_stage_id

to fail with the error:

No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY

It's not clear to be what caused these columns to be created as integers in our BigQuery instance, but that is the case for us.

Relevant error log or model output

`No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY`

Expected behavior

Successful joins between deal IDs of the same data type.

dbt Project configurations

vars: hubspot__pass_through_all_columns: true

Package versions

packages:

What database are you using dbt with?

bigquery

dbt Version

1.4.6

Additional Context

Case #142434

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 1 year ago

Hi @Brideau thanks for raising this issue. Did this issue just appear for you? Was this working previously and then it now randomly started hitting this error, or is this your first time using the package and you are seeing this error?

Overall, this should be a fairly easy fix to cast these fields as strings to ensure the join succeeds. I would prefer we apply this cast in the source package (and I assume you do as well since the issue is created in this repo). As such, we could apply the cast to the staging models and that should do the trick. I will try and get a working version available to you to test out.

That being said, I am curious if this just came out of the blue, or if this is your first time using the package. Thanks!

Brideau commented 1 year ago

Hi @fivetran-joemarkiewicz, this is our first time using this transformation within this package. Other transformations work without a problem, but once we tried to use the Hubspot Deals transformation we ran into this.

fivetran-reneeli commented 1 year ago

Hi @Brideau, we're currently working on this ticket! Could you please help me confirm a few things?

The snippet you posted

left join pipelines 
    on deals.deal_pipeline_id = pipelines.deal_pipeline_id
left join pipeline_stages 
    on deals.deal_pipeline_stage_id = pipeline_stages.deal_pipeline_stage_id

is in our int_hubspot__deals_enhanced model. You mentioned this is where it fails, though it seems that deal_pipeline_id and deal_pipeline_stage_id are already casted as strings in the source package(see deal model, pipeline model, and pipeline stages model . I'm wondering if there are any other areas or fields it could have failed-- perhaps in the actual hubspot__deals end model? Would you mind copying over the entire error message?

fivetran-reneeli commented 1 year ago

Hi @Brideau, just following up!

Brideau commented 1 year ago

Hi @fivetran-reneeli, apologies for the delay. We are actually running this model through Fivetran, and don't have access to the full logs so I can't share anything more with you. Is it possible for us to have you work directly with the Fivetran team and access our logs through them?

Brideau commented 1 year ago

Here is the full log that we have:

dbt run --models +hubspot__deals
19:14:57  Running with dbt=1.4.5
19:14:57  Unable to do partial parsing because saved manifest not found. Starting full parse.
19:15:40  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.WPIC_Warehouse.example
- seeds.WPIC_Warehouse.hubspot__contacts_seeds
19:15:40  Found 106 models, 115 tests, 0 snapshots, 0 analyses, 838 macros, 0 operations, 0 seed files, 43 sources, 0 exposures, 0 metrics
19:15:40  
19:15:50  Concurrency: 1 threads (target='prod')
19:15:50  
19:15:50  1 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline_stage_tmp  [RUN]
19:15:52  1 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline_stage_tmp  [CREATE VIEW (0 processed) in 2.30s]
19:15:52  2 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline_tmp  [RUN]
19:15:54  2 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline_tmp  [CREATE VIEW (0 processed) in 2.11s]
19:15:54  3 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__deal_tmp .. [RUN]
19:15:57  3 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__deal_tmp  [CREATE VIEW (0 processed) in 2.60s]
19:15:57  4 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_company_tmp  [RUN]
19:15:59  4 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_company_tmp  [CREATE VIEW (0 processed) in 2.32s]
19:15:59  5 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_contact_tmp  [RUN]
19:16:02  5 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_contact_tmp  [CREATE VIEW (0 processed) in 2.45s]
19:16:02  6 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_deal_tmp  [RUN]
19:16:04  6 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_deal_tmp  [CREATE VIEW (0 processed) in 2.01s]
19:16:04  7 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_tmp  [RUN]
19:16:06  7 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_tmp  [CREATE VIEW (0 processed) in 2.52s]
19:16:06  8 of 18 START sql view model wpic_warehouse_stg_hubspot.stg_hubspot__owner_tmp . [RUN]
19:16:08  8 of 18 OK created sql view model wpic_warehouse_stg_hubspot.stg_hubspot__owner_tmp  [CREATE VIEW (0 processed) in 2.07s]
19:16:08  9 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline_stage  [RUN]
19:16:12  9 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline_stage  [CREATE TABLE (19.0 rows, 1.1 KB processed) in 4.20s]
19:16:12  10 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline  [RUN]
19:16:17  10 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__deal_pipeline  [CREATE TABLE (3.0 rows, 128.0 Bytes processed) in 4.41s]
19:16:17  11 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__deal .... [RUN]
19:16:23  11 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__deal  [CREATE TABLE (2.1k rows, 1.0 MB processed) in 6.17s]
19:16:23  12 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_company  [RUN]
19:16:29  12 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_company  [CREATE TABLE (405.7k rows, 9.3 MB processed) in 5.48s]
19:16:29  13 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_contact  [RUN]
19:16:34  13 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_contact  [CREATE TABLE (226.4k rows, 5.2 MB processed) in 5.73s]
19:16:34  14 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_deal  [RUN]
19:16:41  14 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement_deal  [CREATE TABLE (87.5k rows, 2.0 MB processed) in 6.92s]
19:16:41  15 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement  [RUN]
19:16:46  15 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__engagement  [CREATE TABLE (1.5m rows, 45.2 MB processed) in 5.23s]
19:16:46  16 of 18 START sql table model wpic_warehouse_stg_hubspot.stg_hubspot__owner ... [RUN]
19:16:50  16 of 18 OK created sql table model wpic_warehouse_stg_hubspot.stg_hubspot__owner  [CREATE TABLE (65.0 rows, 4.3 KB processed) in 3.68s]
19:16:50  17 of 18 START sql table model wpic_warehouse_hubspot.hubspot__engagements ..... [RUN]
19:16:57  17 of 18 OK created sql table model wpic_warehouse_hubspot.hubspot__engagements  [CREATE TABLE (1.5m rows, 56.2 MB processed) in 6.89s]
19:16:57  18 of 18 START sql table model wpic_warehouse_hubspot.hubspot__deals ........... [RUN]
19:17:00  BigQuery adapter: https://console.cloud.google.com/bigquery?project=fivetran-wpic-2-warehouse-qtrl&j=bq:US:baa7871a-9a5c-47b6-9bf7-b0f0c652afc9&page=queryresults
19:17:01  18 of 18 ERROR creating sql table model wpic_warehouse_hubspot.hubspot__deals .. [ERROR in 3.31s]
19:17:01  
19:17:01  Finished running 8 view models, 10 table models in 0 hours 1 minutes and 20.41 seconds (80.41s).
19:17:02  
19:17:02  Completed with 1 error and 0 warnings:
19:17:02  
19:17:02  Database Error in model hubspot__deals (models/sales/hubspot__deals.sql)
19:17:02    No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [49:12]
19:17:02    compiled Code at target/run/hubspot/models/sales/hubspot__deals.sql
19:17:02  
19:17:02  Done. PASS=17 WARN=0 ERROR=1 SKIP=0 TOTAL=18
fivetran-reneeli commented 1 year ago

Great, thank you @Brideau! This helps a lot. Note that the following is more for leaving some context for our internal team to tackle, but feel free to follow along.

I now can see the error is resulting from the hubspot__deals model, from either of the following snippets:

    from engagements
    inner join engagement_deals
        using (engagement_id)

(link)

    from deals_enhanced
    left join engagement_deal_agg
        using (deal_id)

(link)

On those respective join fields. Backtracking to the staging models, I can see that these fields have not been cast into specific data types, so what's happening is across different source tables they must getting ingested as different types. Relevant models that will need the specific type casting are:

stg_hubspotengagement_contact stg_hubspotengagement_deal stg_hubspot__deal

And doesn't hurt to finish the type casting for remaining fields & staging models that don't have it.

Btw, relevant transforms models are:

hubspot__engagements int_hubspot__deals_enhanced

We'll plan to add this fix to our upcoming sprints!

fivetran-reneeli commented 1 year ago

Just to leave some context on new findings since the last comment:

So I think I misinterpreted the error. It's not that these keys should be strings, rather they should be integers (looking at get column macros and our internal live data). So I think out of the 2 joins happening in hubspot_deals on engagement_id and deal_id, one or both are getting misrepresented as strings. P

deal_pipeline_id and deal_pipeline_stage_id are supposed to be strings. But deal_id and engagement_id are intended to be integers, and this is where the join is erroring out.

As a solution, we've explicitly cast those join fields as integers (see PR here)

fivetran-reneeli commented 1 year ago

Hi @Brideau!

Would you mind trying to run this package with the following updates in this branch? You would need to switch your packages.yml to below. I noticed you're doing a dbt run +hubspot__deals, but it'd be great if you could do a full dbt run if possible, so we can see if this issue shows up elsewhere.


packages:
  - git: https://github.com/fivetran/dbt_hubspot.git
    revision: bugfix/datatype-cast
    warn-unpinned: false
fivetran-reneeli commented 1 year ago

Hi @Brideau , we've merged this change to the hubspot main branch. Feel free to give it a go! Closing this issue for the time being but feel free to open this back up if you have any difficulties.