Closed marclipoff-curative closed 1 year ago
@marclipoff-curative thanks so much for opening this issue. I am just starting to look into the error and noticed the LISTAGG issue is occurring within our mixpanel__sessions
model at the following line.
The output of this event frequencies field will create a JSON object of the frequency of each event_type
during this user session. As Redshift has a limit of 65535, it seems your sessions are having more frequency than the allotted amount. Before digging into the solution I wanted to ask a few quick questions:
event_frequencies
field important to you? You could analyze it further by exploding the JSON object, but curious if you find this field valuable for your current analysis. If you do not, then we can easily set a variable that will disable this field if you want.event_frequencies
field valuable, would it defeat the integrity of the field if we set a limit on the LISTAGG? For example, if we limited the LISTAGG to 50000 would you find the integrity of the field to be compromised since we are artificially stopping the event_type
frequency?For arguments sake, I actually ended up quickly removing this field if you wanted to see the package models succeed and you can evaluate the outcome and consider the importance of this field. You can leverage this version of the package by adding the below package configuration to your packages.yml
packages:
- git: https://github.com/fivetran/dbt_mixpanel.git
revision: hotfix/frequency-variable
warn-unpinned: false
Let me know your thoughts!
I dont need it at the moment. So I'd be fine excluding it. When running that hotfix, I get:
16:09:04 Running with dbt=1.0.1
16:09:04 [WARNING]: Deprecated functionality
The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
16:09:04 [WARNING]: Deprecated functionality
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
16:09:05 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- sources.dwh.landing.src_helpscount
16:09:05 Found 443 models, 505 tests, 0 snapshots, 4 analyses, 832 macros, 2 operations, 4 seed files, 633 sources, 0 exposures, 0 metrics
16:09:05
16:09:09 Concurrency: 10 threads (target='prod')
16:09:09
16:09:09 1 of 3 START view model base.stg_mixpanel__event_tmp............................ [RUN]
16:09:12 1 of 3 OK created view model base.stg_mixpanel__event_tmp....................... [CREATE VIEW in 3.27s]
16:09:13 2 of 3 START incremental model base.mixpanel__event............................. [RUN]
16:12:06 2 of 3 OK created incremental model base.mixpanel__event........................ [INSERT 0 2800820 in 172.16s]
16:12:06 3 of 3 START incremental model base.mixpanel__sessions.......................... [RUN]
16:12:06 3 of 3 ERROR creating incremental model base.mixpanel__sessions................. [ERROR in 0.81s]
16:12:07
16:12:07 Running 2 on-run-end hooks
16:12:07 Beginning drop of orphaned relations. dry_run=True
16:12:07 This command would be run: drop view if exists base.redshift_admin_running_queries cascade;
16:12:07 1 of 2 START hook: dwh.on-run-end.0............................................. [RUN]
16:12:07 1 of 2 OK hook: dwh.on-run-end.0................................................ [OK in 0.00s]
16:12:07 2 of 2 START hook: dwh.on-run-end.1............................................. [RUN]
16:12:07 2 of 2 OK hook: dwh.on-run-end.1................................................ [OK in 0.00s]
16:12:07
16:12:07
16:12:07 Finished running 1 view model, 2 incremental models, 2 hooks in 182.13s.
16:12:08
16:12:08 Completed with 1 error and 0 warnings:
16:12:08
16:12:08 Database Error in model mixpanel__sessions (models/mixpanel__sessions.sql)
16:12:08 syntax error at or near "("
16:12:08 LINE 128: listagg((event_type || ': ' || number_of_events), ', ')
16:12:08 ^
16:12:08
16:12:08 Done. PASS=2 WARN=0 ERROR=1 SKIP=0 TOTAL=3```
Sounds good, thanks @marclipoff-curative!
Were you able to run dbt clean
before installing the new package? Also, would you be able to run dbt run --full-refresh
to make sure we skip over any holdover incremental logic from past runs?
I think the problem is the stray (
. right?
Where do you see the stray (
? The error is a bit strange since I completely removed the listagg
function from the branch.
It compiles to this:
agg_event_types as (
select
session_id
-- turn into json
-- '{' ||
listagg((event_type || ': ' || number_of_events), ', ')
|| '}' as event_frequencies
from (
select
session_id,
event_type,
count(unique_event_id) as number_of_events
from session_ids
group by session_id, event_type
) as sub group by session_id
),
I think the fivetran_utils.string_agg
macro creates a new line. So when commenting out using a --
, the new line doesn't get commented out.
Can you attempt and re-run dbt deps
as I think that may be a previous commit since I can see it is commented out. Below is the current version of that CTE in the hotfix branch:
Actually, I was just messing around and think I found a way for users to leverage this field even if there are a few fields that exceed the limit. See my code below
Would you be able to test the new version of the branch as well and let me know if it succeeds for you? If it doesn't, you can try and adjust the new mixpanel__event_frequency_limit
variable to be smaller than the default 50,000 I set (50,000 may be too high of a default haha).
vars:
mixpanel__event_frequency_limit: 1000
HI @marclipoff-curative I just wanted to reach back out and see if you were able to give the above branch a test?
We are planning to release this early next week and I wanted to make sure it worked on your end before pushing the changes to our next release.
Let me know if you have any questions!
I think it worked. My cluster is having a tough time churning on the huge amount of data we have... so cant be 100% yet
Thanks! I will also limit the default to maybe 1000 to hopefully help with the string agg explosion as well
Closing this issue as PR #27 addressed the initial issue identified within this bug report.
Is there an existing issue for this?
Describe the issue
When running
mixpanel__sessions
model, redshift throws the following error:Relevant error log or model output
No response
Expected behavior
The model should run without error
dbt Project configurations
Package versions
What database are you using dbt with?
redshift
dbt Version
1.0
Additional Context
No response
Are you willing to open a PR to help address this issue?