PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
21.62k stars 1.29k forks source link

Inconsistency between breakdown and roll-up of funnel #5341

Closed marcushyett-ph closed 3 years ago

marcushyett-ph commented 3 years ago

Bug description

This is debatable as to whether its a bug - but I think it could cause a lot of confusion and loss of trust in our product.

I create a funnel and can see 1,875 were successful: image

Then I break down by browser and the number is now only 1,746 image

Expected behavior

I would assume we would show the same number in total from a breakdown or a non-broken down funnel step.

We may be omitting an "other" category which catches the smaller things we didn't break down by?

I'm also curious what's limiting us from having more breakdowns? Could we get 10 rather than limiting to what appears to be 5 today?

How to reproduce

  1. Take any funnel step with a significant number of events and break it down by browser
  2. https://app.posthog.com/insights?insight=FUNNELS&properties=%5B%5D&filter_test_accounts=true&events=%5B%7B%22id%22%3A%22%24pageview%22%2C%22name%22%3A%22%24pageview%22%2C%22type%22%3A%22events%22%2C%22order%22%3A0%7D%2C%7B%22id%22%3A%22%24pageview%22%2C%22name%22%3A%22%24pageview%22%2C%22type%22%3A%22events%22%2C%22order%22%3A1%7D%2C%7B%22id%22%3A%22%24pageview%22%2C%22name%22%3A%22%24pageview%22%2C%22type%22%3A%22events%22%2C%22order%22%3A2%7D%5D&actions=%5B%5D&interval=day&new_entity=%5B%5D&funnel_viz_type=steps&display=FunnelViz&date_from=-1d&date_to=dStart&breakdown=%24browser&breakdown_type=event

Environment

Additional context

cc: @EDsCODE @alexkim205 @macobo @neilkakkar as this is likely to requires some core experience / core analytics collaboration.

cc: @clarkus for any design considerations here

Thank you for your bug report – we love squashing them!

clarkus commented 3 years ago

It sounds like a bug or just some fine-tuning of how breakdowns are working. Per the spec:

It’s important to consider that not all steps could have the breakdown property or could have an empty breakdown value, so we need to be able to display a nil/not applicable value. This means that the overall bar and conversion rate would be the same as if no breakdown was applied.

If the breakdown property has high cardinality, we’ll only show the top 15 breakdown properties and bucket all the rest in an “Other” category.

That said, 15–20 breakdown values in a stacked bar chart is not going to make for a great experience. In the most recent funnels work, I'm trying to simplify visualizations to illustrate 1–2 metrics only. The more descriptive text we have in the funnels, the less they're going to scale. Instead I have been relying on the table below to itemize every breakdown value in detail. This is somewhat outside the scope of this bug, but just wanted to provide some direction for where I see this improving in subsequent iterations. You can see that work at https://github.com/PostHog/posthog/issues/5230

alexkim205 commented 3 years ago

I wonder if it's because we aren't showing any browser instances where $browser property isn't set or is null in the breakdown. We've seen this happen when the user is triggering an event from <webview> and may account for the 1875-1746=129 missing users. Double checking to see if this is the case.

Update

It looks like we're only getting back breakdown counts for non-null breakdown values (5 in total here).

Screen Shot 2021-07-27 at 9 57 42 AM
marcushyett-ph commented 3 years ago

Great - so if the Core analytics folks are able to return a count for the null's we should be good?

I feel we must also be missing an other category too (since I only get 5 countries in a breakdown, and there must be people from more than 5 countries using our product / website)?

neilkakkar commented 3 years ago

Correct: There's a default limit of 5 in the breakdown values. I'll update this to be customisable + include NULLs. Unsure though about how we should decide this limit?

marcushyett-ph commented 3 years ago

Amazing thanks @neilkakkar

Yeah coming up with a limit here is hard - I imagine the queries will be more expensive, or the UI will be slower to load of we don't have a limit, so we probably need something arbritary.

From using the product, 5 feels too few, to me 10 feels like a good arbitrary limit to start with - but I'm not sure if we have the color palate today to support that many @clarkus?

neilkakkar commented 3 years ago

A question: Does it make sense to follow this same behaviour (showing NULLs) across all our breakdowns (i.e. in trends as well?) - I can then go for a more general fix for this.

clarkus commented 3 years ago

From using the product, 5 feels too few, to me 10 feels like a good arbitrary limit to start with - but I'm not sure if we have the color palate today to support that many @clarkus?

We have 10 data viz palette options now. I have been working on a side issue to expand this palette, but haven't completed the work yet. I am working on breakdowns and comparisons now. Here's my take:

Unless there is a technical constraint, breakdowns are limited by the available space in the chart (based on chart type and layout) and the query composition. For example:

Screen Shot 2021-07-28 at 8 20 48 AM

In this example, we are using stacked bars. You can see that this extreme example does not scale very well. If the visualization type were adjusted to show distinct bars per each metric, we could improve scale a great deal, but you can see still that there is some upper bound of complexity.

Screen Shot 2021-07-28 at 8 22 19 AM

So all that said, we will need identify reasonable defaults for each insight analysis type and any visualization options within that insight. Secondary to that, we can give users the controls they need to configure a visualization that's reasonable for their needs.

clarkus commented 3 years ago

I was testing our palette and some real scenarios for using bars to visualize categorical data. The chart here is at our target desktop support size (1280px). This is showing 11 bars per point on the x-axis. There are 10 points and the area for each point is capped around 160px. I think this could scale to include a few more bars, but it's going to be difficult to distinguish each series of bars without at least a bar's worth of spacing between each. This also illustrates the limit of the data viz palette currently defined for the product. We can add options to the palette, but at some point this is going to be really hard to visually parse. A legend, or some corresponding table could help make it more understandable. A tooltip that annotates specific values can also help.

Bars

Here I am representing comparison ranges. We can expect the category count to be reduced by half in this case, as we'll see two bars (one for each range) for each category of data, for each point on the axis.

Bars with comparisons

neilkakkar commented 3 years ago

This works well now, except the default limit is still 5. Since there's been no objection on this so far, now switching up the default to 10.

At any time, if we wish to change this, the frontend can pass the breakdown_limit parameter for max breakdown count. (cc: @paolodamico @alexkim205 )

macobo commented 3 years ago

This works well now, except the default limit is still 5. Since there's been no objection on this so far, now switching up the default to 10.

What was the change?

Testing it out in production (e.g. breaking down by country code) IMO still has the same fundamental issue as laid out originally - any data point beyond $LIMIT (5 or 10) is not visible, the totals change.

neilkakkar commented 3 years ago

The change: https://github.com/PostHog/posthog/pull/5357

Correct, the totals change (if > LIMIT breakdown values), but they now change in a way that's consistent, which we can explain on the UI!

macobo commented 3 years ago

Ack - given we don't explain this yet though WDYT about either leaving this issue open (since the core issue isn't solved) or creating a new one? :)

neilkakkar commented 3 years ago

Since there's more to explain about inconsistencies, not just this, will create a separate issue: https://github.com/PostHog/posthog/issues/5427

marcushyett-ph commented 3 years ago

Awesome thanks @neilkakkar

marcushyett-ph commented 3 years ago

@alexkim205 is there anything we need to change on the UI beyond #5427 ?