Closed efernandez1-2u closed 1 year ago
Hi @efernandez1-2u! Apologies for the delay here. We're materializing our models as views, and to your point that model has the incremental config at the model-level which would override anything from dbt_project.yml. I tried running something like:
select
session_id, count(*)
from
segment_web_sessions__stitched
group by 1
having count(2) > 1
I didn't get any results. How are you identifying the presence of duplicates?
Also, we're using a Snowflake warehouse and don't have access to a Redshift environment. I don't believe we'd be able to write a fix if there is an issue and it's specific to Redshift. This is the first issue that's been submitted since we adopted this package from dbt, and I'm not 100% familiar with all of the ins and outs of this package but I'll do my best to help! 😄
I used a query very similar to the one you posted to prove there were duplicates. the duplicated records were those being pulled in from the 3 hour trailing window, new records did not egt duplicated but those that were already brought in that were being pulled again from the next runs 3 hour trailing window were duplicated if that makes sense. Also every time I ran the model the record count would increase by the number of records in the trailing window
I had a similar issue that was fixed by running
dbt clean
and dbt deps
. Maybe it will work for you too
appreciate the suggestion, I will give it a try
@efernandez1-2u Were you able to figure this one out?
@efernandez1-2u Going to close this one out soon if it's no longer an issue :)
Describe the bug
I was experiencing a duplication of records when running the segment models, specifically the segment_web_sessions__stitched model which then affected the others downstream. I was able to resolve this by removing a configuration I had made for the segment models in my dbt_project.yml but I dont quite understand how that resolved it.
Steps to reproduce
in my dbt_project.yml file I added this block to define where I wanted the models target for delivery.
base: +tags: [] +schema: ngwt_cmbd
sessionization: materialization: incremental +tags: [] +schema: ngwt_cmbd`
the materialization: incremental line seemed to be the problem because when I removed it, no duplicates occurred in either full refreshes or regular incremental runs. I just fail to understand how that had the ability to cause an issue. The materialization is already in the model definition so would it matter if its defined in either place or both, doesnt only one superceded the other? And in this case it was the same materialization so how would it have changed things.
Expected results
expected that models would upsert/merge any records to avoid duplication
Actual results
on full refreshes, no duplicates, any incremental runs after that would duplicate records it already had found. seems it would append instead of merge
Screenshots and log output
System information
The contents of your
packages.yml
file: `packages:package: dbt-labs/dbt_utils version: 1.1.0
package: Fleetio/dbt_segment version: 0.1.1 ` Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using: Mac OS The output of
python --version
: Python 3.9.12Additional context
Are you interested in contributing the fix?