Closed aspicer closed 1 day ago
Hey @aspicer! 👋 This pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future.
Your pull request is modifying functions with the following pre-existing issues:
📄 File: posthog/hogql_queries/insights/funnels/base.py
Function | Unhandled Issue |
---|---|
_get_step_counts_query |
ValidationError: ["Funnels require at least two steps before calculating."] posthog.tas... Event Count: 40 |
Did you find this useful? React with a 👍 or 👎
@webjunkie
Would you mind taking a look at this failing test? I don't think I understand it. It's test_funnel_step_breakdown_empty
https://github.com/PostHog/posthog/actions/runs/9720856763/job/26832823920?pr=23348
This seems to have changed the behavior of the test. If I try to alter the hogql statement to something like if(distinct_id = 'user-two', None, 'foo')
it returns a breakdown with an empty string and 'foo'.
Thanks!
@webjunkie
Would you mind taking a look at this failing test? I don't think I understand it. It's
test_funnel_step_breakdown_empty
PostHog/posthog/actions/runs/9720856763/job/26832823920?pr=23348Thanks!
The idea was to have a query that returns some breakdown values as None... hence my idea to simulate this somehow. Maybe the clickhouse function works differently now with the adjusted analyzer? Any other ideas to get None as well as other legitimate breakdown values out?
@webjunkie
This change causes the query to return '' instead of None. Looking at the issue, the empty string seems to actually be consistent with what we want, so this seems okay.
Problem
tl:dr; we should enable the experimental analyzer in Clickhouse.
Why? Because we're getting bad data because the old analyzer allows bad queries, and who knows what it returns. Here is a slightly modified version of a query we're running. It's a funnel actors query. It runs, but it returns 5 rows. This isn't right, we were losing people, this is why I'm investigating it.
If you then enable the setting
allow_experimental_analyzer=1
at the end, it throws an error:Code: 215. DB::Exception: Column max_steps is not under aggregate function and not in GROUP BY keys.
Hmmmm. So when I look at the code, I realize that the new analyzer is correct. We have a window function and we're trying to use it in a HAVING clause after a GROUP BY without a proper aggregation.So I change
HAVING ifNull(equals(steps, max_steps), isNull(steps) and isNull(max_steps)))
toHAVING ifNull(equals(steps, max(max_steps)), isNull(steps) and isNull(max(max_steps))))
and boom goes the dynamite - now it returns the proper amount, 20 rows. (edited)Changes
Enable experimental analyzer for funnels and fix query. Also add "hash" join as a fallback join type because this exception gets thrown otherwise.
E DB::Exception: Only `hash` join supports multiple ORs for keys in JOIN ON section. Stack trace:
Does this work well for both Cloud and self-hosted?
Yes
How did you test this code?
Passed tests and rigor of new analyzer.