PostHog / posthog

🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
https://posthog.com
Other
21.98k stars 1.32k forks source link

Paths - Diagnosing Causes #5545

Closed paolodamico closed 3 years ago

paolodamico commented 3 years ago

As discussed sync today, opening this issue so we can take the next ~24 hours exploring the problems that the Paths product could solve towards our Diagnosing Causes goal to see if it makes sense for it to be the theme for the next sprint (1.28 2/2). @marcushyett-ph summarized it best when comparing against the incumbent option to work on (session recording #4884), "we seem to feel that session recording will not be a waste of our time but it might also not be the best thing to be working on".

CC @marcushyett-ph @fuziontech @macobo @EDsCODE @kpthatsme @jamesefhawkins @timgl

marcushyett-ph commented 3 years ago

Some client context here:

"I want to view the user sessions through the lens of user paths" For example: There are 79 users who drop off in my funnel at a certain stage, to diagnose the cause I need to

Other feedback

clarkus commented 3 years ago

Related https://github.com/PostHog/posthog/issues/5342

EDsCODE commented 3 years ago

Additional points mentioned elsewhere and from exploration (will keep adding as I think of more):

How to connect to funnels

kpthatsme commented 3 years ago

Top of mind:

paolodamico commented 3 years ago

Been giving this some thought, pardon the verbosity, it's quite late and it has been a very long day. Before issuing an opinion on whether we should focus on this or not, I wanted to take us through the exercise of what problem we're solving, whether it's a problem worth solving, and if this is the right solution for the problem. Would love to hear thoughts on this before deciding to work on this right now or not.

Paths @ Diagnosing Causes

Taking a step back and thinking from first principles what we're trying to solve: if I'm working on optimizing my conversion and particularly understanding why users didn't convert, knowing what users did after their last completed step seems like a natural way to start (in most contexes, but not all: for instance if they bounced from your product, you won't be able to get any valuable insights from this [quant correlation analysis or session recording may provide better insights]). Knowing what they did before can also provide some interesting insights.

Knowing what they did before/instead of the funnel can provide me some hints into why they didn't convert but it's only a partial picture and requires some assumptions / judgement on my part (e.g. if in my purchase funnel I see that a significant number of users went to the pricing page instead of purchasing, I could assume that it's because pricing is unclear on my purchase page, same way I could assume it's because it was perceived as expensive and users were looking for a cheaper option).

Is paths the right approach to answer the question "what did users do instead of converting?" I'm thinking it's not the best approach from a user's perspective, but it may be the best feasible approach. If I asked this question to another human I would expect a concrete answer with parsed responses (e.g. users instead of purchasing are moving their cart to "save for later", or users instead of purchasing are browsing for alternative products). The reason we may jump to paths as the solution to this question might be that is the best way in the realm of feasible to translate raw data into conclusions (all the features we've been thinking about, from error monitoring to having n-deep paths are attempts to answer this question), but this might also be a great opportunity for a disruptive solution. The traditional paths feature requires trial-and-error and is susceptible to biases of whomever interprets it. I think the same applies to the before question.

We have a huge opportunity in answering these questions (with or without paths) mainly due to autocapture. Presumably we have a larger sea of data that could help better answer them (vs alternatives who rely only on custom events or page views), but it'll also be a huge challenge. We'll have to figure out a way to group meaningful sets of actions together so this question can be properly answered.

Now let's say we become amazing at answering these questions. Does this help us answer "why they didn't convert?". It helps, but it's not quite there. Adding a quant layer next (or maybe even before) could help narrow the scope of potential hypotheses (correlation analysis for instance). With all this, I can probably use judgement in some cases to build hypotheses, but in others (if not in most) I'll want to layer in some qualitative knowledge (e.g. through session recording) to better inform these hypotheses.

Should we actually work on this?

I think it's a matter of answering,

marcushyett-ph commented 3 years ago

Thanks for putting a lot of effort into this and sharing the well thought out context here @paolodamico

I feel there's a major issue which we need to accept with every solution we build in this space: It's almost impossible to know why something happened for certain (even if you speak to the person, they may not remember)

So I believe our role here should be to provide the most likely clues for people to piece together in order to build a hypothesis for why something happened to the highest degree of confidence.

As such I'd like to quickly evaluate our options though a rough heuristic of Ability to provide high confidence clues

How do we increase peoples ability to find clues?

How do we increase confidence in clues found?

How do each of our approaches weigh up against these heuristics? Based on the general approaches we've discussed, I've rated them by the above heruistics

[For some reason the table should be here - but its showing up at the bottom of this issue, so scrolll to the bottom first to see the table]

This rough exercise highlights the benefits of taking multiple approaches and illustrates the greater impact we're likely to get from a single approach is around correlation and the next highest is probably from paths. ## Answering the should we questions? - **Is knowing what users did instead of converting / after their last completed step useful to knowing what they didn't convert?** - Yes, the nature of paths use of data aggregation will be able to give a high confidence when many users take a similar path, it is also fairly clear for a user to interpret actionable results - We would also expect this to regularly produce trivial and meaningful results such as ($pageleave) as in addition to meaningful results - **Is paths the best way to answer the question above?** - On its own, No, Correlation Analysis is likely to be the best here, however combined with session recording or surveys this could potentially be better than Correlation Analysis - either way I don't see a reason not to build both - **Is knowing what users did before reaching a funnel useful to knowing why they didn't convert?** - This is not ultra-clear to me, I would hypothesize that most people convert by following the "path of least resistance", being able to find "the path of most resistance" is going to be an actionable clue as to why people are dropping off - but seeing 20 different unrelated paths to failure - is unlikely to be actionable - **Is paths the best way to answer the question above?** - Probably, I feel this depends on how well we can solve the correlated analysis approach, providing correlations to single events and properties might not be as easy to string-together and interpret as a seeing a "path of most resistance" laid out - **Does answering the questions above provide more immediate value than nailing session recording?** - Yes, on its own I believe people can get clearer and more confident clues from investing in paths than session recording, however to increase confidence further in clues it makes a ton of sense to implement multiple approaches - we might get to incremental value quicker through focus on paths than session recordings.
paolodamico commented 3 years ago

Thanks @marcushyett-ph! Your reasoning above really helps put into context the prioritization decision. I think we can discuss tomorrow in the actual spring planning, but IMO it makes sense to work on this for the next sprint. It'll have a higher short-term impact than session recording.

Before going all-in on Paths, I'm still wondering if we can't figure out a better way to solve these problems. Mainly, is there a way we can provide the answers to users already distilled? Imagine when you ask a question on Google and just get the big bold answer. In a traditional Paths feature for instance, the user will have to explore paths among many branches until someone catches their attention (couldn't we figure this out automatically for them?) and they can draw a conclusion. Thoughts @neilkakkar @macobo @EDsCODE ?

Re paths before a funnel -> I think this can provide some useful insights too. For instance, users who visited our deployment docs, feature pages and blog posts could have different signaling on activation conversion than users who just navigated directly to the signup page.

kpthatsme commented 3 years ago

Mainly, is there a way we can provide the answers to users already distilled? Imagine when you ask a question on Google and just get the big bold answer. In a traditional Paths feature for instance, the user will have to explore paths among many branches until someone catches their attention (couldn't we figure this out automatically for them?) and they can draw a conclusion.

So I definitely think this is the dream state of where we want to go and how we want to get there.

In practice, I think it's going to be really tough for us to know and what the most important and relevant things are for analysts. Even they, themselves often don't know what the right things are when they have troves of clean data and PMs/analysts spending hours thinking through the problem. I worry by trying to surface the right answer we may end up surfacing the wrong things or cutting out important bits.

I think another approach to this problem is to work backwards. It's tough for us to determine what is important, but I think it's easier for us to determine what isn't important. And I think through noise reduction and strong tools around those types of controls, we end up helping expose the right paths and the more relevant things. This is a bit of what I was getting at in terms of noise reduction and controls in my comment above around features in Paths.

I do think Paths is a great interface to interact with this kind of chart and do this type of analysis, but to your point this doesn't necessarily need to be paths-specific.

paolodamico commented 3 years ago

That makes a ton of sense @kpthatsme and I think it's starting to get into solution specifics. Perhaps the answer is that Paths is a great way to start not only for our users, but so that we can also understand how this works (with maybe the goal of reaching this dream state in the future). What do other engineers think too? I would hate to discard this idea without giving it a fair shot.

neilkakkar commented 3 years ago

Tying the past few comments into the framework I mentioned here:

Automating things (smart insights) is great, but when you don't know how to be "smart", instead of giving a shitty experience, letting users ask all sorts of questions they want (diagnoser driven analysis + event driven analysis) is the next best thing.

This allows us to (1) figure out what questions lead to insightful* answers. (2) Build smart stuff on the back of all this data.

This implies that a necessary step to good "smart" analysis is a foundation that allows users to themselves explore and ask as many questions as they want. i.e. Diagnoser driven analysis opens up the path (pun intended) to better smart analysis.

Paths I think fit very well into this exploration.

Specific to your question about distilling insights, I don't think we need to start with 100% smart, discovering insights automatically (because, well, it's a hard problem). But, we can take a small quick-win to this by highlighting very small/huge conversion rates on a path/breakdown: Thanks to the user, we've generated the graph already, and know conversion rates for each segment, and highlighting the abberations goes a long way in ensuring that the users don't miss it.

Edit: Very Very Side Note: I don't think we'd need ML at all to go "smart" - we just need to calculate all the thingsTM. And knowing what to calculate comes thanks to smart users.

marcushyett-ph commented 3 years ago

Makes sense to me.

+1 on not needing ML for this problem - we will probably want to do some kind of regression analysis to identify possible correlations - but trying to use neural nets to identify why people weren't successful feels like overkill and will likely lead to conclusions that are hard to interpret.

paolodamico commented 3 years ago

Alright closing this down as implementation details are being discussed in #5665

Approach Clarity Sample Size Inferred Priority
Session Recording High Low P2
Paths Mid High P1
Correlation Analysis High High P0
Customer Surveys Mid Low P3