PostHog / posthog

🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
https://posthog.com
Other
22.34k stars 1.35k forks source link

What is sessions page? #4884

Closed macobo closed 3 years ago

macobo commented 3 years ago

Conclusion: We should rewrite or create a new sessions page. Changes include removing "sessions" and replacing them with one recording = one row, no list of events feature under the table.

More details under https://github.com/PostHog/posthog/issues/4884#issuecomment-895911672


I think it is worthwhile to revisit the concept of the sessions page from the ground up to support session recording and our users better.

Problem statement

Our sessions page is currently driving a 2 very different use-cases:

  1. Users can see dynamically constructed sessions on this page and see what actions a user took on the page
  2. Users can search and view interesting session recordings there (if enabled)

I'm not very confident in usecase (1) since I haven't heard users have great success with it.

In the current implementation, the two are often at odds with each other.

  1. "Sessions" are constructed dynamically from events in the backend, while session recordings are tab-based.
    1. This may cause 0 or 2 or 5 different "session recordings" to show up under one "session". Or one session recording to show up under multiple sessions.
    2. Session durations may not match session recording lengths since the duration calculation (at least used to be) made based on the difference between first and last event.
    3. This can lead to a general feeling of "session recordings are missing"
    4. If autocapture is off users may end up with "invisible" session recordings if no normal events were captured.
  2. Performance - "sessions" are expensive to calculate, while "session recordings" are relatively inexpensive
    1. Due to this, we currently only show sessions per day on the page. This in turn hurts session recording bad - if you're searching for a rare event you need to look through many pages.
  3. "Sessions" discriminate based on distinct_id (and not person_id)
    1. So if a user logs in and their distinct_id changes as a result, events prior to login would be counted as a separate session
    2. It's currently unclear where the session recording would end up - under one of the sessions?
  4. "Sessions" include backend-event only sessions which would not include any session recordings.
    1. If the backend uses a different distinct_id from clients javascript setup, the sessions are

What can we do about it?

I don't have a good answer. Some thoughts:

I think the trouble begins with the definition of "session" and "session recording" being different. One groups all events (including backend/mobile/others) into one, other uses an internal tab-specific id generated on the javascript client. We can't really use the tab-specific id for sessions without losing support for events outside posthog-js to be displayed.

Some sort of a "sessions" <-> "session recording" mapping is really useful, since it allows to search for recordings where some specific event happened and look up the specific time and we don't have that context from the recording itself.

Perhaps the solution lies in flipping the script - a separate page for session recordings which is inversely linked to sessions?

However more context for usecase (1) would be needed here.

Additional context

@paolodamico @marcushyett-ph @kpthatsme Food for thought product-wise.

This is a follow-up to https://github.com/PostHog/product-internal/issues/86.

mariusandra commented 3 years ago

Thanks for writing this! Having just implemented the "use the session recording duration if it's longer than the duration between the first and last events" feature (#4853), I felt this pain as well.

We can't really use the tab-specific id for sessions without losing support for.

Missing the bit after "for...?" :). I expect you were referring to losing support for filtering?

I think from the user's perspective, the "right solution" is clear. Just like the events page has multiple tabs (events, actions, event properties, etc), we have a similar system here. Once you have enabled session recordings, the first/default tab under "sessions" is "session recordings", queried from the session recordings table. The second tab is "computed sessions" or whatever we call what we currently have.

Applying various filters (give me rageclicks for pro plan users), the session <-> session recording mapping, etc are implementation details that we can sort out once we prioritise this.

Usecase 1 still has some value IMO, and is clearly the only thing you can rely on if you disable session recordings (e.g. handling medical data requires you to), so I wouldn't nuke it. Just put it behind an extra click when possible.

macobo commented 3 years ago

Missing the bit after "for...?" :). I expect you were referring to losing support for filtering?

Oops. Thanks, added missing bit to "We can't really use the tab-specific id for sessions without losing support for events outside posthog-js to be displayed."

I think from the user's perspective, the "right solution" is clear.

I kind of agree, but if we're rethinking this anyways it might make sense to talk with customers and revisit this anyways. The devil is in the details and our concept of a session might need tweaking!

marcushyett-ph commented 3 years ago

I don't have any strong gut opinions here - other than from my own experience of using the product. I found the difference between sessions and session recordings is not clear - especially if your main focus is using session recording. The concept of navigating session recordings through session events was uniquely valuable whilst trying to understand why users were or were not doing something.

I kind of agree, but if we're rethinking this anyways it might make sense to talk with customers and revisit this anyways. The devil is in the details and our concept of a session might need tweaking!

I agree with this - we should spend some time with our users getting to know why they use sessions / session recordings and the root problem they're trying to solve by doing it, this should give us a clearer idea of the direction we should go in. @paolodamico thoughts?

kpthatsme commented 3 years ago

@macobo awesome write up, I feel like sessions has been a bit of an elephant in a room :)

Unfortunately I really don't have any strong ideas here yet. I don't think any product today solves the session analysis use case out of the box for the 80%, because measuring sessions seems to be one of the more arbitrary kinds of analysis.

It's tough because I think a lot of what would be measured in sessions can be measured in trends, funnels, session recordings, and retention – which arguably are better kinds of analyses today.

Let's say for example – we had a really flexible systematic way for implementors to start and end sessions – would it be more beneficial to roll that type of meaning into our other kinds of analyses (i.e. insights now can now look at $session events and do time based breakdowns).

I'm not sure what the answer is to all this yet but those are some of thoughts, excited to explore more about the right thing to build here is.

marcushyett-ph commented 3 years ago

What is the purpose of Sessions?

Context

Based on the question Karl posed, I wanted to spend some time to take a step back and think why people need to use sessions or session recordings to see if these can help us prioritize or and build a better product. Warning this is a bit of a long read and goes in a quite different directions to help explore the purpose of sessions.

Analogy

I’m going to use the analogy of investigating a crime to help frame up the purpose of sessions.

For product analytics, we need to invert this analogy, we actually want as many people to be as successful as possible - so we’re trying to do the opposite to a detective, we already know who committed the crime (successful users) and we want to work out why everyone else didn’t commit the crime.

When investigating a crime, detectives are looking to identify 3 things about any potential suspect:

Types of Evidence

When investigating a crime there are a ton of different ways evidence can be broken down, for simplicity I’m going to stick to the following:

How does Secondary Evidence help us understand Means, Motive and Opportunity?

As highlighted above, sessions and session recordings are secondary evidence, so we cannot rely on them to give us the full picture of what happened and must make some interpretations as to what they might mean.

Using the framework for committing a crime above we can consider how sessions help us validate these dimensions:

Can you summarise this into something snappier?

Sure, I believe the main reasons people want to use sessions are as follows:

Is there an alternative abstraction we could use other than a session?

Since users do a lot on a product, It’s hard to navigate a long continuous list of events, actions or recordings. Sessions are a simple way to break down their actions into chunks that are likely related with eachother and make easier to navigate a potentially large amount of data.

Potentially removing the session abstraction all together we could group events and recordings by 4 dimensions to help people navigate to the exact event information they need to without exposing the session concept at all:

Whats interesting about his abstraction (and food for thought) is that is very similar to how filters in trends works today and if we integrate session informations and recordings well into trends well - we might able able to do without a separate sessions tool.

This is just some early thinking - would appreciate any further thoughts and feedback.

clarkus commented 3 years ago

I like the inverted crime-solving analogy and I think it makes a lot of sense. While this is targeted at "catching the perpetrator" there could be a secondary goal of identifying other possible perpetrators through their common attributes and behaviors. If we can identify an archetypal user, someone who really matched our desired behaviors, that archetype could be used as a model for matching against other users. Imagine finding this archetype, and then a simple action that said something to the effect of "show me more users like this". Secondary to that kind of powerful action, we could roll up common attributes for a given set of sessions and show distinct user lists and a weighted list of attributes based on their recurrence across sessions. This is weird to say, but take FBI profiling and apply that to session exploration as it relates to persons. It could be pretty powerful.

This is all very focused on the who aspect of sessions, but similar solutions could be applied to describe the when, where, and type aspects of sessions.

There is some related discussion at https://github.com/PostHog/posthog/issues/4960#issuecomment-873133339 and https://github.com/PostHog/product-internal/issues/92#issuecomment-872602869.

paolodamico commented 3 years ago

Love that you brought this up @macobo! (and apologies for the delay).

Proposal (TLDR)

Rationale / My thoughts:

macobo commented 3 years ago

Given this will come into focus next sprint - any new thoughts and developments here, input from customers? cc @paolodamico

paolodamico commented 3 years ago

I don't have any new context from users at this point but do hope to be able to provide more soon. I do think the proposed experiment approach could be a solid way to start. We could in the meantime consider alternative approaches / use cases for this view.

FYI we have this doc in which we're starting to discuss the potential sub-focus (theme) for the next sprint (e.g. session recording), and other potential approaches to sessions.

macobo commented 3 years ago

So I spent today looking at a bunch of session recording tools and I think I have an approach similar to @paolodamico's in mind.

Read https://github.com/PostHog/product-internal/issues/127#issuecomment-895908203 before this!

From first principles:

Key reasons to use sessions / session recording

Adapted from https://www.fullstory.com/resources/the-definitive-guide-to-session-replay

  1. Reproduce and solve bugs
  2. Supporting customers via context
  3. Conversion rate optimization
    • Special case: Understand and improve onboarding [UX]
  4. Improve user experience:
    • Generally, looking at how users are using $feature
    • Understand and improve onboarding [UX]

We're not currently excelling at any of these.

What to focus on

Strategically I'd focus on 3 and 4 for now. 2 will become good on it's own when 3-4 are great and 1 requires extra tooling built on top (e.g. network/console level capture) which are a distraction right now.

In conversion rate optimization/nailing diagnosis, step 3 is analyzing things qualitatively, that's where session recording comes in.

What role does sessions play here

Where are we going wrong right now

I think the big issue is that there’s sessions and session replays. Users don’t care about the difference!

Minor issues are:

How to improve

Note this list is not logistical, just some musings on how I'd do this in isolation. I'd prefer to solve step 2 - diagnose causes qualitatively first :)

paolodamico commented 3 years ago

Very aligned with you @macobo, and I think this context will be extremely relevant for everyone working on this. On the specifics of what to actually work on for the next sprint, here's what I'm proposing:

Specifically I've been thinking about the sessions / recordings page and I would like to challenge it. My strongest argument against working on this right now is that this page will not actually be solving for our Diagnosing Causes goal. It's certainly helpful (particularly for the other use cases of session recording) and will definitely be needed to avoid confusion even with with Diagnosing Causes, but it seems tangential. This being said I'm making a proposal anyways to justify getting rid of the sessions page (with numbers & user feedback) and creating a A/B test for this, but it might not be the right time to ship this yet. Thoughts?

macobo commented 3 years ago

My strongest argument against working on this right now is that this page will not actually be solving for our Diagnosing Causes goal.

I agree - the goal of my previous post was to outline a longer-term solution (and how it fits into the larger scheme), not to outline the plan for the next sprint :) However let's take a look now!

Thoughts on logistics/tactics

So the improvements you listed fall into three camps:

1. Improving the player / play page

Improving this is in no way blocked by the improvements listed above.

However the improvements here also IMO only softly align with "Diagnosing causes" since they're too far down the hierarchy of things user needs to do.

2. Improving sessions page (via ordering, etc)

This is closer to the "Diagnosing causes" root (kind of the 3rd step). However there is one cause why I think it's better to make these improvements after we do the steps outlined in my previous post:

Performance is tricky in the current sessions page.

  1. The way we're dynamically grouping sessions together would make e.g. improving ordering really tricky :)
  2. One large issue with the current sessions page is the date filter (you only see recording from a single day at a time) - this hurts investigation really bad in e.g. the scale we work on.

Both of these become much more trivial by making the page more bare-bones and stripping out the "sessions" -> "recordings" dichotomy.

3. Improving reliability

I don't think this is really a priority if we create a new page because:

  1. Most of the "recording is missing" reasons which are outlined in the first post (person_id vs distinct_id, mismatching times, etc) just don't exist with this page.
  2. Users won't also feel the pain as acutely

There's also a fourth hidden improvement:

4. Improving "linking" pages together

The more functionality we add here (e.g. user dropped out at Xth step in funnel) the more work we create to refactor later. See also the argument on "performance" under 2.


That said, all is not perhaps as rosy as I portrayed in my original post. Replacing the page requires figuring out some "tricky" questions like:

  1. What happens to the users page?
  2. How do we want to handle linking features together
  3. If we keep the two pages alive at the same time, do we add functionality to both?

etc.

That said, I think it's important to get this ball rolling and what you propose here makes sense imo as a first step:

This being said I'm making a proposal anyways to justify getting rid of the sessions page (with numbers & user feedback) and creating a A/B test for this, but it might not be the right time to ship this yet.

marcushyett-ph commented 3 years ago

This sounds like a reasonable focus to me:

Capturing other useful events/properties & allowing filtering and ordering (will help with prioritization), like “frustration”, “dead clicks”, requests load time, page performance / resource usage, bounces, etc. Focus on improving the player. eg. skipping inactive, improving the playback bar, better “affordances” on the bar around when events and inactivity occurs, clarify redacted inputs, … Capturing and reliability issues (Sentry errors, compression errors, ….) Improving the play page experience ?, clear and useful list of events, additional useful context ** unsure if we have enough context for this.

However I'd be keen to explicitly call out solving the link between funnels (or persons modal more generally) and a specific (part of a) session recording - I think this is essential for anyone to diagnose a cause using session recordings.

Would be great to work with the wider team(s) to break these down into clear projects individual team members can tackle or collaborate on during the next sprint?

marcushyett-ph commented 3 years ago

@macobo If (hypothetically) people are using the persons modal as the main entry point to session recordings, would we not come across the same issue with some people having missing recordings etc? (e.g. we cannot just filter out people if they don't have a recording)

macobo commented 3 years ago

Thing is I don't think the issue is "reliability" in most cases.

Here are some scenarios where currently user is currently left with a feeling of "sessions are missing".

  1. Application sent some events from the backend (resulting in a separate "session" in the FE)
  2. User got "posthog.identify"d in the middle of the session (person_id vs distinct_id)
  3. When user reads a TOC page for 40 minutes (falling asleep at the computer) but consistently scrolling. Will result in "2" sessions but 1 session recording
  4. User bounced on the page immediately (e.g. <5s session)
  5. Site is problematic for session recording - e.g. has deep nested HTML consistently changing, making the session recording events huge
  6. User is on a bad mobile network, we didn't manage to receive the events

Now, while 4-6 are real cases, every session recording tool struggles with them and it'll be hard to make more than incremental progress on this. However issues 1-3 are in my opinion more visible and completely caused by our own product decisions (which I'd hope to fix via solution laid out above).

For person modal - I think solving 1-3 will lead to more of an improvement there than 4-6 for e.g. funnels (which naturally take a longer time to complete)

paolodamico commented 3 years ago

Alright, created a PR (https://github.com/PostHog/posthog.com/pull/2028) to finish the discussion and reach a final conclusion as this issue has become too bloated. Keeping this issue around because there are a bunch of issues which will likely be solved by whatever ends up being the final solution, and we should update those.

paolodamico commented 3 years ago

We're still crossing a few ts (pun intended), but the fundamental issue has been addressed, closing.