PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.38k stars 1.21k forks source link

Cumulative aggregation on trends when using average count for user math #15816

Open mariusandra opened 1 year ago

mariusandra commented 1 year ago

Is your feature request related to a problem?

Yes, see this zendesk ticket or internal slack thread.

Describe the solution you'd like

When creating a trend insight that 1) uses the math "average count per user" 2) uses a breakdown

Then the math used is basically count() / count(distinct person_id).

When breaking down, each breakdown value is evaluated like a completely separate insight. Effectively, this equation is used.

countIf(breakdown_value = 'Value') / countIf(distinct person_id, breakdown_value = 'Value')

Instead, a customer has requested that we offer an option to use

countIf(breakdown_value = 'Value') / count(distinct person_id)

Describe alternatives you've considered

A HogQL insight like this is inefficient, but gets the job done:

with (
   select distinct_id, properties."breakdown_prop" as breakdown_value 
   from events 
   where timestamp > now() - interval 1 day and event='my custom event'
) as all_events,
(
   select count(distinct distinct_id) from all_events
) as unique_user_count,
(
   select breakdown_value
   from all_events 
   group by breakdown_value
) as breakdown_values,
(
   select breakdown_values.breakdown_value, 
          divide(count(all_events.distinct_id), unique_user_count) as count_per_user
   from breakdown_values
   left join all_events 
   on all_events.breakdown_value = breakdown_values.breakdown_value 
   group by breakdown_values.breakdown_value
) as final
select * from final order by count_per_user desc

Additional context

From the customer:

When doing a breakdown by property in which the metric is an average, I should have the option as a user to determine the calculation behaviour. Currently, breakdown basically splits the events that are analysed into independent series, which are then analysed individually. In many cases the appropriate thing to do would be to consider the entire user population rather than the populations per proprerty.

Thank you for your feature request – we love each and every one!

slshults commented 1 month ago

+1 https://posthoghelp.zendesk.com/agent/tickets/15703