Closed moseleyi closed 1 year ago
Hi @moseleyi thanks so much for opening this Issue!
This definitely seems like a behavior that is not intended and should be addressed. I did some querying on my side as well and found the same results you are mentioning where the same stage is applied to each stage in the final model for the deal.
I looked a bit closer and found that we seem to filter out inactive stages in our staging model. https://github.com/fivetran/dbt_hubspot_source/blob/3888cde8d1d87ef65e406a9511509bb2af5128d1/models/stg_hubspot__deal_pipeline_stage.sql#L37
This in turn ensures that each stage in the hubspot__deal_stages
only references the active stage (which I imagine is not the desired outcome). It seems that removing that where clause in the staging model could resolve this issue. Do you see any issue on your end with including inactive stages in the downstream transformations?
Since it's a historical table, the active flag is completely redundant, as we want to see all past stages regardless of their current status.
I ran the models without the WHERE
clause you mentioned. Suddenly the deal ID has disappeared from hubspot__deal_stages
AND hubspot__deals
, even though it's still there in stg_hubspot__deal_stage
.
I'm not sure this is it because when I look at stg_hubspot__deal_pipeline_stage_tmp
all the stages have FALSE
in _fivetran_deleted
:
Any news? It's critical bug as it stops being able to analyse deal stages
Hi @moseleyi thanks for getting back with these details. I was able to look into this some more and found that removing the filter didn't have the same effect you are mentioning where the deal_id
seems to be removed. Would you be able to swap your packages.yml
dependency on the hubspot package for the one below:
packages:
- git: https://github.com/fivetran/dbt_hubspot.git
revision: bugfix/deleted-stages
warn-unpinned: false
Let me know if this resolves the issue on your end!
Still seeing the same results:
hubspot.deal_stages
stg_hubspot__deal_stage
This code:
select
deal_stage.deal_id || '-' || row_number() over(partition by deal_stage.deal_id order by deal_stage.date_entered asc) as deal_stage_id,
deal_stage.deal_stage_name,
deal_stage._fivetran_start as date_stage_entered,
deal_stage._fivetran_end as date_stage_exited,
deal_stage._fivetran_active as is_stage_active,
deals_enhanced.*
from deal_stage
left join deals_enhanced
on deal_stage.deal_id = deals_enhanced.deal_id
Is a little bit weird. Why do we bring all the deals' data to deal stages? That's why my query returns the label only of the current one. I expected the pipeline_stage_label
to be related to deal_stage_name
not to the current deal's one
I added this:
SELECT
deal_stage_id,
deal_stage_name,
date_stage_entered,
date_stage_exited,
deal_id,
dps.pipeline_stage_label,
dps.deal_pipeline_id,
dp.pipeline_label
FROM final
LEFT JOIN analytics.base_hubspot.stg_hubspot__deal_pipeline_stage dps ON dps.deal_pipeline_stage_id = final.deal_stage_name
LEFT JOIN analytics.base_hubspot.stg_hubspot__deal_pipeline dp ON dp.deal_pipeline_id = dps.deal_pipeline_id
WHERE
deal_id = 9631290514
To get the name of each stage, and here are the results for the same deal:
hey @moseleyi - so yes you are correct in that we are only grabbing pipeline stage information for the deal's current stage, which doesn't make sense for the hubspot__deal_stages
model.
i've adapted your logic from the query above and joined in information for each stage if you wanna try out this branch again!
packages:
- git: https://github.com/fivetran/dbt_hubspot.git
revision: bugfix/deleted-stages
warn-unpinned: false
Thanks @fivetran-jamie for the update. Because we bring all the data from Deals to Deal Stages, I'm still confused as to which field corresponds to the stages. pipeline_stage_label
is stil a current stage because it's being taken from deals. Bringing all the data from parent to child table causes a lot of confusion especially since it's a historical table. There's no differentiation in hubspot__deal_stages
as to which fields are historical (deal_stage_name
, deal_stage_id
) and which ones are current (pipeline_stage_label
, deal_pipeline_stage_id
).
Two things:
deal_stage_id
That's because Deal Stage Name is still a numerical way of hubspot naming the stages. What we need is a label:
@moseleyi I completely agree -- having the parent deal
data in there is confusing (and unnecessary, as one could just join with the hubspot__deals
model)
i've removed the deal
info (with the exceptions of deal_id
and deal_name
) from the bugfix/deleted-stages
branch if you could re-run and check if things seem clearer/correct
Do I need to update the package information?
- git: https://github.com/fivetran/dbt_hubspot.git
revision: bugfix/deleted-stages
warn-unpinned: false
When I do dbt build
, I can't even see the latest code
oh hm, if you're in dbt cloud i don't think you can dbt clean
, but did you dbt deps
after switching the hubspot package reference?
You need to change reference to the deal name field, it's dealname
not deal_name
from deals_enhanced
Apart from that, things are working as expected :)
great! i'm a little confused about the dealname issue though...i believe we rename the column to deal_name
in the staging model -- are you seeing an error somewhere?
Indeed I do see an error (after doing dbt deps and dbt build):
When I look at Fivetran source hubspot.deal table it's property_dealname
Then it's renamed to to deal_name
in stg_hubspot__deal
but only in one of the IF statements
{% if var('hubspot__pass_through_all_columns', false) %}
-- just pass everything through
{{
fivetran_utils.remove_prefix_from_columns(
columns=adapter.get_columns_in_relation(ref('stg_hubspot__deal_tmp')),
prefix='property_')
}}
from base
{% else %}
-- just default columns + explicitly configured passthrough columns
_fivetran_synced,
is_deleted,
deal_id,
cast(deal_pipeline_id as {{ dbt_utils.type_string() }}) as deal_pipeline_id,
cast(deal_pipeline_stage_id as {{ dbt_utils.type_string() }}) as deal_pipeline_stage_id,
owner_id,
portal_id,
property_dealname as deal_name,
property_description as description,
property_amount as amount,
property_closedate as closed_at,
property_createdate as created_at
--The below macro adds the fields defined within your hubspot__deal_pass_through_columns variable into the staging model
{{ fivetran_utils.fill_pass_through_columns('hubspot__deal_pass_through_columns') }}
-- The below macro add the ability to create calculated fields using the hubspot__deal_calculated_fields variable.
{{ fivetran_utils.calculated_fields('hubspot__deal_calculated_fields') }}
from macro
{% endif %}
But my value for this variable is actually true
(so it should be renamded) but if I look into stg_hubspot__deal
the name is dealname
.
Shouldn't the statement in the IF say {% if var('hubspot__pass_through_all_columns', true) %}
?
Ohhhh great catch ... yeah we're renaming property_dealname
(and property_closedate
and property_createdate
) inconsistently, depending on the value of hubspot__pass_through_all_columns
I've just pushed some code that renames these columns to deal_name
, closed_at
, and created_at
regardless of your value for hubspot__pass_through_all_columns
if you could re-install and try it out! thanks
Works now : )
@fivetran-jamie Do I need to wait for the release?
ah yeah the release should come out tomorrow -- we were rolling out the source package release but uncovered a bug with dbt core that impacts some new functionality in the packages
Thanks!
release is live so closing this one out!
Is there an existing issue for this?
Describe the issue
Here is an extract for a deal that has two different stages:
The stage_ids are properly mapped in
stg_hubspot__deal_pipeline_stage
But then,
hubspot__deal_stages
shows the same id for both stagesRelevant error log or model output
No response
Expected behavior
The first row should have Pipeline_stage_label = New and Deal_pipeline_stage_id = 6309924
dbt Project configurations
Package versions
packages:
What database are you using dbt with?
snowflake
dbt Version
1.1.1
Additional Context
No response
Are you willing to open a PR to help address this issue?