fivetran / dbt_iterable

Fivetran transformation models for Iterable using dbt.
https://fivetran.github.io/dbt_iterable/
Apache License 2.0
3 stars 5 forks source link

iterable__user_unsubscriptions - bad updated_date #41

Closed rick1290 closed 5 months ago

rick1290 commented 6 months ago

Is there an existing issue for this?

Describe the issue

Note on iterable dbt package.

iterable__user_unsubscriptions:

updated_at Timestamp of when the user unsubscribed from the channel or message type

The above definition for that column is wrong in that table - it seems that the update_at - is the last date of any unsubscription - not the actual date of the unsubscription. So if I unsubsubcribed from X 2 years ago - and the I unsubscribe from Y yesterday. All those rows will be updated to the yesterday date. Its also mixing subscriptions to lists vs. channels/messagetypes when it comes to dates. Makes this table difficult to work with.

https://fivetran.github.io/dbt_iterable/#!/model/model.iterable.iterable__user_unsubscriptions

Relevant error log or model output

Note on iterable dbt package.

iterable__user_unsubscriptions: 

updated_at Timestamp of when the user unsubscribed from the channel or message type

The above definition for that column is wrong in that table - it seems that the update_at - is the last date of any unsubscription - not the actual date of the unsubscription. So if I unsubsubcribed from X 2 years ago - and the I unsubscribe from Y yesterday. All those rows will be updated to the yesterday date. Its also mixing subscriptions to lists vs. channels/messagetypes when it comes to dates. Makes this table difficult to work with. 

https://fivetran.github.io/dbt_iterable/#!/model/model.iterable.iterable__user_unsubscriptions

Expected behavior

The updated_date should be the date of the actual unsubscription date

dbt Project configurations

Default package

Package versions

What database are you using dbt with?

bigquery

dbt Version

Most recent

Additional Context

I love these iterable packages - keep them coming :)

These queries are difficult to write so these come in handy. I'd love to see this table fixed tho - this one is quite important to us:)

Are you willing to open a PR to help address this issue?

fivetran-catfritz commented 6 months ago

Hi @rick1290 thank you for opening this issue. We will review this in the next day or so and let you know if we have any questions.

rick1290 commented 6 months ago

Thanks! @fivetran-catfritz Will stay posted :)

fivetran-catfritz commented 6 months ago

Hi @rick1290 I am reviewing this and have a couple questions. Could you confirm if you have the source table USER_UNSUBSCRIBED_CHANNEL with the column _fivetran_id? There was a schema change on August 10, 2023, so it would be helpful to know if you're on the old schema or the new schema.

Also I see that you are on v0.10.0 of the package. I don't think upgrading to the current version v0.11.0 will solve this particular issue, but I'm also curious if you have tried it out/had issues with it?

rick1290 commented 6 months ago

@fivetran-catfritz I'm on the old schema

Agreed - don't think upgrading as well will help. I've been upgrading over time and havn't seen any changes. I can try when I get some time next week.

fivetran-jamie commented 6 months ago

Hi there @rick1290 -- this latest_batch_index window function is likely the culprit, as it will return the latest unsubscribed timestamp for each user across all channels/message types, and we use it to filter out records here.

I believe adding channel_id or message_type_id to the partition clause of the window functions would resolve the issue, mostly. This would then return the latest unsubscription timestamp per user per channel/message type. I am wondering if it makes sense to return the latest timestamp (in case a user has unsubscribed, re-subscribed, and unsubscribed again from the same channel) or instead return the first unsubscription timestamp?

As for your later point

Its also mixing subscriptions to lists vs. channels/messagetypes when it comes to dates. Makes this table difficult to work with

Would you mind clarifying a bit here?

rick1290 commented 6 months ago

@fivetran-jamie Great!

latest unsubscription timestamp per user per channel/message type

The above is the logic we want for this.

In regards to clarifying - we just want to make sure that the timestamps are for channel/message types - and not when a user drops from a list.

Really glad you are working on a fix for this!

fivetran-jamie commented 6 months ago

Gotcha cool! Would you be willing to test out a working branch when we have a fix drafted?

rick1290 commented 6 months ago

Yep :)

rick1290 commented 5 months ago

@fivetran-jamie Any update or progress on this issue :)

Thanks1

fivetran-avinash commented 5 months ago

Hi @rick1290 ! I've currently taken on the work on this bug from @fivetran-jamie and am hoping to have a fix ready by the end of this sprint.

I've deployed the proposed changes in the dbt_iterable_source package that should bring in the latest unsubscription behavior for an email based on channel_id and message_type_id, and should bring in the latest unsubscribes on both the message type and channel level.

You can test the changes before we go ahead and deploy them alive. Would you be able to update the packages.yml file in your dbt_iterable package to:

packages:  
  - git: https://github.com/fivetran/dbt_iterable_source.git
    revision: bugfix/catch-all-recent-unsubscribes 
    warn-unpinned: false

Then run your package and see if everything looks as proposed? This will provide the final level of validation needed for us to merge this into a new release!

Thanks for your patience.

fivetran-avinash commented 5 months ago

Hi @rick1290 ! This new version of dbt_iterable_source is now live in our latest v0.8.1 release with the changes that should bring in all the latest unsubscribes in the iterable__user_unsubscriptions model. You can run dbt deps to pick up the latest version of the package.

Let us know if you experience any issues here!

rick1290 commented 5 months ago

@fivetran-avinash Thanks! Do I need to run a full refresh or anything?

rick1290 commented 5 months ago

@fivetran-avinash

I get the following errors when I run a ful-refresh. Mind you - our data is from 2021 - so we are on legacy schema. Have the new updates been tested on the original schema? I read the breaking changes - but not sure it should be causing errors.

Thanks!!

02:45:43    Database Error in model int_iterable__latest_user (models/intermediate/int_iterable__latest_user.sql)
  Unrecognized name: unique_user_key at [13:38]
  compiled Code at target/run/iterable/models/intermediate/int_iterable__latest_user.sql
02:45:43  
02:45:43
02:45:43    Database Error in model int_iterable__list_user_history (models/intermediate/int_iterable__list_user_history.sql)
  Unrecognized name: unique_user_key at [23:47]
  compiled Code at target/run/iterable/models/intermediate/int_iterable__list_user_history.sql
02:45:43  
02:45:43
02:45:43    Runtime Error in model iterable__user_unsubscriptions (models/iterable__user_unsubscriptions.sql)
  404 Not found: Table tsn-marketing-operations:dbt_models_stg_iterable.stg_iterable__user_unsubscribed_channel was not found in location US; reason: notFound, message: Not found: Table tsn-marketing-operations:dbt_models_stg_iterable.stg_iterable__user_unsubscribed_channel was not found in location US
rick1290 commented 5 months ago

Just reverted back to v10 and its working. Would love to see how I can get v11 working with these awesome new changes :)

fivetran-avinash commented 5 months ago

Morning @rick1290 ! Are you able to see what version of fivetran/dbt_iterable_source the v0.11.1 version of dbt_iterable is picking up when you run dbt deps? I'm wondering if it's looking at an older version of the package without unique_user_key.

rick1290 commented 5 months ago

Morning @fivetran-avinash Please see the below:

15:52:29 Running with dbt=1.6.14
15:52:30 Installing fivetran/iterable
15:52:31 Installed from version 0.11.0
15:52:31 Up to date!
fivetran-joemarkiewicz commented 5 months ago

Hey @rick1290, @fivetran-avinash is OOO for the rest of the week, so I'll jump in to help for the time being.

It looks like the only contents we see there are for the fivetran/iterable dbt package, but not the fivetran/iterable_source package (where Avinash's changes were made). 🤔

Would you be able to share the contents of your packages.yml?

rick1290 commented 5 months ago

@fivetran-joemarkiewicz See below.

fivetran-joemarkiewicz commented 5 months ago

Hey @rick1290 it looks like you shared the contents of the dbt_project.yml. Would you be able to share the packages.yml contents? Additionally, do you by chance have a package-lock.yml in your project? You are correct that the source package should be installed automatically, but it doesn't seem to be the case as we don't see the source installation in your above response.

rick1290 commented 5 months ago

@fivetran-joemarkiewicz Oops.

`packages:

fivetran-joemarkiewicz commented 5 months ago

Thanks @rick1290, and do you by chance see a package-lock.yml file in your project?

rick1290 commented 5 months ago

@fivetran-joemarkiewicz No lock.yml in the project

Any other ideas?

fivetran-joemarkiewicz commented 5 months ago

Thanks @rick1290, and just to confirm you do not see fivetran/iterable_source anywhere in your dbt deps log?

Also, when you full refreshed, did you run dbt run -s +iterable --full-refresh? It doesn't seem like the upstream changes were captured in the run where you see the error. I can mainly see this due to the error message that stg_iterable__user_unsubscribed_channel does not exist. This is a renamed model from the v0.8.0 dbt_iterable_source release. On a full refresh (while also running the upstream models) this error shouldn't occur.

One last attempt here would be to try the following steps:

Ideally after these steps you should see success. Let me know if this works!

rick1290 commented 5 months ago

@fivetran-avinash @fivetran-joemarkiewicz

The above worked - thanks!

That said - take a look at the following screenshot. I'm still seeing the same original bug on the new tables. The updated_at date is being attached to ALL channels and messages. The updated_at date should be the date the user unsubscribed from the channel or message_type. Should be a unique date_time for each channel if they were unsubscribed individually.

Screenshot 2024-06-05 at 3 28 16 PM
rick1290 commented 5 months ago

Below is an example of a user actually 'subscribing' to the channel/type and not unsubscribing on 5-10-24 but the channel is included in the unsubscribe table along with the date subscribed.

Screenshot 2024-06-05 at 3 55 02 PM

I'm wondering if we need another column in here - that is boolean - that lets us know if they are current unsubscribed from the message_type_name? Any other ideas?

rick1290 commented 5 months ago
Screenshot 2024-06-05 at 4 02 28 PM

An example of a user who unsubscribed 2-27-24 then resubscribed on 4-10-24. The 6-05-24 was attached to no subscription action.

I'm thinking subscribe may be getting mixed in here somehow - and something else I can't figure out for the 6-05 date stamp.

fivetran-avinash commented 5 months ago

Hi @rick1290 , I'm wondering if your run is still utilizing the old version of the iterable_source package. If this data you provided is all reflective of one user, the new partitions should be creating unique indices based off of the email/channel_id and email/message_type_id mode and taking those unique updated_at times.

Are you able to view the compiled code on the new tables? It should be in your target/compiled folder. Can you check and see if the new compiled code for stg_iterable__user_unsub_message_type and stg_iterable__user_unsubscribed_channel matches the new partition logic on the v0.8.1 version?

fivetran-avinash commented 5 months ago

Below is an example of a user actually 'subscribing' to the channel/type and not unsubscribing on 5-10-24 but the channel is included in the unsubscribe table along with the date subscribed.

Screenshot 2024-06-05 at 3 55 02 PM

I'm wondering if we need another column in here - that is boolean - that lets us know if they are current unsubscribed from the message_type_name? Any other ideas?

Interesting findings @rick1290 ! These last two messages feels like a separate issue though--why is any subscription data appearing in the unsubscribe tables? That feels like a connector issue. This is worth filing a support ticket with our Fivetran team to see why subscription data is appearing at all.

rick1290 commented 5 months ago

@fivetran-avinash

Hi @rick1290 , I'm wondering if your run is still utilizing the old version of the iterable_source package. If this data you provided is all reflective of one user, the new partitions should be creating unique indices based off of the email/channel_id and email/message_type_id mode and taking those unique updated_at times.

Are you able to view the compiled code on the new tables? It should be in your target/compiled folder. Can you check and see if the new compiled code for stg_iterable__user_unsub_message_type and stg_iterable__user_unsubscribed_channel matches the new partition logic on the v0.8.1 version?

It matches - confirmed.

rick1290 commented 5 months ago

Below is an example of a user actually 'subscribing' to the channel/type and not unsubscribing on 5-10-24 but the channel is included in the unsubscribe table along with the date subscribed.

Screenshot 2024-06-05 at 3 55 02 PM

I'm wondering if we need another column in here - that is boolean - that lets us know if they are current unsubscribed from the message_type_name? Any other ideas?

Interesting findings @rick1290 ! These last two messages feels like a separate issue though--why is any subscription data appearing in the unsubscribe tables? That feels like a connector issue. This is worth filing a support ticket with our Fivetran team to see why subscription data is appearing at all.

I don't believe this is a sync issue as I initially filed a ticket which led me to filing the issue here as I believe this is tied to the underlying data models being run in this package. When I look at custom events in the iterable_event table - all those unsubscribe and subscribe events are correct. For some reason though - the subscribe event date/time is being added to the updated_at field in this table.

Happy to hop on a quick call to discuss this more :) I'm going to be thrilled once this use-case can be solved by the package.

rick1290 commented 5 months ago
Screenshot 2024-06-05 at 4 48 55 PM

I'm now spot checking the user_unsubscribed_channel_history table... could be a sync issue....

These dates don't make any sense :( :(

Really unsure how this data is gathered - i'm thinking it might make more sense to use the iterable_event table --- and utilize those unsubscribe and subscribe events to identify the dates - and if they are activiely unsubscribed.

rick1290 commented 5 months ago
Screenshot 2024-06-05 at 5 21 06 PM

I was thinking maybe the new connector could solve this - but i'm not seeing any date stamps on the unsubscribe tables. Filed a ticket in the meantime - thanks for the help on this!

fivetran-avinash commented 5 months ago

Hi @rick1290, thanks for the thorough investigation! I believe this is still a Fivetran connector issue. The updated_at value is not properly processing the unsubscribe time events in the way we would expect. That is not something our data models can control for, as we expect the updated_at value to be consistent for the time of an unsubscribe event.

So to summarize,

Some examples of incorrect updated_at behavior in these tables in the examples you've shown above:

I'm seeing you created an additional ticket. I would relay all this information in the last few comments you provided in this thread to Fivetran support, particularly the screenshots regarding the edge cases you see. That way they can investigate what's happening in each case thoroughly. Hopefully, it's a bug, and this can be resolved internally.

If the behavior of the connector has changed for the updated_at values in these history models, then we can schedule a call, reopen this ticket, and take next steps.

Let me know if this all makes sense!

UPDATE: Just saw your latest attempt to try the new connector. Unfortunately the new connector tables are no longer history tables, and will just have the most recent unsubscribe value. I don't think a date value comes with that.

rick1290 commented 5 months ago

I see - thanks everything.. .very much appreciated!!!

rick1290 commented 4 months ago

@fivetran-avinash

Still no progress with support on this request unfortunately.

I thought I'd note one thing. I think it would be quite awesome if we can take event_name = 'emailUnSubscribe' and use that event to build an unsubscribe table and then compares it to the user table. Somehow I think this is the only way to get accurate unsubscribe data as the current methods are not accurate.

fivetran-avinash commented 4 months ago

Hi @rick1290 , good investigation on your side! It'd be interesting to see if we could incorporate this idea to create a more sustainable unsubscription model.

Would you be able to share your data of how using the event_name= emailUnSubscribe would be able to fill the same role for rebuilding the iterable__user_unsubscriptions model for your purpose, particularly with the additional grains on channel and message type? Also, is the proper timestamp coming through in all cases in your data?

(You can do it in the support ticket too if it's PII, but just let me know here so I can take a look!)

rick1290 commented 4 months ago

@fivetran-avinash Yep - this would be awesome to see.

event_name= emailUnSubscribe and emailSubscribe match our data perfectly - so it makes sense to use.

Essentially, you'd use the iterable__event table - to get emailUnSubscribe and emailSubscribe data/events.

UNEST channel_id's and message_types in additional_properties ---> which would create a new row for each email, id + unsubscribe event.

Write a rule to check if emailUnSubscribe date > emailSubscribe date for each email, id combo. This will provide accurate data on IF the user is current unsubscribed or subscribes to a channel or message type.

You can then join this back on the channel group - and find out if they are unsubscribed channel wide.

Happy to do a workshop and discuss with you more. I've gone down a rabbit hole on this and build something for this today --- pretty raw tho - but it works.

rick1290 commented 4 months ago

@fivetran-avinash Were you able to see the query that was created by the support team in my ticket? If you have time - would love to get your thoughts and see how it can be reflected in the dbt package to make things accurate.

The query they provides works - but i'm still not sure its the best approach - but want to see what you think. Thanks!!