Time conversion analysis - new view

paolodamico commented 3 years ago

This issue is to discuss the frontend implementation of the time conversion analysis view. Full product specs can be found here.

Initial wireframe draft below (from @corywatilo's new UI on #4535). Looking for feedback @corywatilo @marcushyett-ph @mariusandra @kpthatsme. The histogram is quite simple that's why it's mocked. On the y-axis we have a regular number scale as we have today. On the x-axis we have each bucket range.

paolodamico commented 3 years ago

Depending on how the discussion evolves on #4535, about the bold new UI, happy to update the wireframes but I think we can get a solid sense of functionality with the above.

marcushyett-ph commented 3 years ago

I think this generally makes sense - one thing that doesn't seem intuitive to me is:

Which steps is this histogram showing for? And how do you select the steps which the histogram is calculated between?

paolodamico commented 3 years ago

As discussed offline, I'll table this and further adding any details here so that whoever picks this up can work out the best implementation approach on their own.

paolodamico commented 3 years ago

Alex, the above raw wireframes, plus the product specs should provide useful context on how to approach this problem. The wireframes are just one idea on how to potentially solve this problem, but feel free to take it any direction you see best. Please let me or @clarkus know if you'd like to brainstorm, work on wireframes or mockups together. Regardless of how you decide to approach this, I'd encourage you to detail your approach (either written down, wireframes or mockups) so we can have some team feedback before building.

Adding some additional context here (CC @marcushyett-ph in case you want to provide additional context):

This is another tool to address the underlying problem, how do you improve your conversion rate? To improve it, you have to understand what drives it, and one potential driver (or at least signal) is time.
Some key questions that I may want to understand related to a funnel: Are users who convert within certain time frame convert significantly better or worse? How much time does it take for my users to go between steps (average is useful, but I would like to understand the distribution a bit better as average can be myopic)? How much time does it take to complete the entire funnel?

marcushyett-ph commented 3 years ago

Thanks for this @paolodamico

One other reason might be worth considering as context:

Is there a relationship between the flow taking longer to complete and a user being unsuccessful (might indicate that the flow is difficult for some and easier for others)

kpthatsme commented 3 years ago

Really like this direction @paolodamico.

I think you and @marcushyett-ph named the use cases pretty well. One way this has been helpful to me is in measuring improvements to things that impact the overall UX but not necessarily the overall conversion rate.

One concrete way I've used this is for captcha flows. For example, let's say we have an important flow (e.g. login) that takes 20% of users a lot of time. Since this is a really important form, this poor experience probably won't be obvious in conversion rates or median/avg step times. This type of analysis will make it easier to identify these types of issues and measure improvements against them.

samwinslow commented 3 years ago

Would love to continue the conversation here with reference to the new steps view (screenshot of built UI):

Would both a horizontal and vertical UI be worth implementing? We can reuse much of what I already have made for the horizontal steps UI in order to iterate on the underlying functionality more quickly.

Also, when it says "Click on a step to view conversion time between steps", which element should be clicked?

I personally think distribution charts between each step are much more informative than a giant distribution chart of all steps – because the time between each step cannot always be assumed to be roughly equal. From a UI perspective, I think we should show many small distribution charts, one for each "edge" between step nodes.

The statistical reasoning here is sort of like what happens when you aggregate data on a scatter plot; correlations get averaged out. Take a look at graph (c) below.

If the red points and black points belonged to discrete groups, red would be slightly positively correlated, black negatively correlated. Aggregating those together makes it hard to draw any meaningful conclusions about the whole set.

Perhaps an even better example, imagine you collapsed the groups in graph (b) to a single set. It would be impossible to tell which group was responsible for which part of the curve.

kpthatsme commented 3 years ago

Also, when it says "Click on a step to view conversion time between steps", which element should be clicked?

Where would this CTA be? Not seeing it in any of the mocks we discussed.

I personally think distribution charts between each step are much more informative than a giant distribution chart of all steps – because the time between each step cannot always be assumed to be roughly equal. From a UI perspective, I think we should show many small distribution charts, one for each "edge" between step nodes.

I think designing a UI to show the distribution graph between each step is going to get tough – for this scenario, I'd imagine the user to actually create a subset of the funnel and run time analysis on that.

Let's say we have a funnel of Sign up -> Data Ingest -> Discover Learning. From the updated funnel view, I'll be able to see median + avg step times between each step.

We notice that Sign up --> Data Ingest seems like it's taking a lot longer than we'd like. A team decides they want to improve this metric and really dig into what's going on around the edges.

The next step for them would be to create a tighter funnel of 'Sign Up --> Data Ingest' only and to use the timing distribution to measure progress in that area of the funnel.

Showing it all at once could definitely be a better solution (I'd need to see what that looks + feels like) but I think with the current proposal the MVP case is still going to be addressed adequately.

paolodamico commented 3 years ago

Great points @samwinslow! My thoughts,

When it says "Click on a step to view conversion time between steps", which element should be clicked?

I think we can have two ways to access this view. You either click on the "Time to convert" analysis type in the right hand graph control, or click on the avg. time to convert label in each step. If you click on the label, it shows you the time to convert between the two relevant steps, if not, the global.

100% agree we should allow analyzing two specific steps as opposed to only the entire funnel. One way to support this is by clicking the relevant label (as on [1]), but we'll probably want to add an additional control somewhere to allow switching the focus of the graph without having to go back to the main funnel.

I'm a bit on the fence about showing all distribution graphs in a single view. On the one hand it can be confusing/overwhelming. On the other it could help me quickly get some insights in a step where I wasn't considering. Perhaps something to try out?

alexkim205 commented 3 years ago

Thanks, there's a lot of solid points here.

I'm kind of with Kunal here with only having one histogram showing at a time. With multiple distributions, it might be difficult to interact with the data at a higher resolution and the mobile experience will struggle.

I like the max two step histogram compromise, but I feel like navigating btwn main stepped funnel and time analysis btwn steps needs a bit more fine tuning. How do we provide a backToUrl to the main funnel after time analysis filters are changed? Which version of the filters are the source of truth across different funnel graphs?

Linking up w/ @paolodamico and @samwinslow later today to talk about the right next steps

sparse notes from chat

Decide on what we want time conversion view to look like
Navigation between steps and time conversion
Distribution between two steps and multiple steps
How to visualize breakdown in histogram view @clarkus

alexkim205 commented 3 years ago

Histograms seem pretty simple to implement, but I guess the biggest unknown going into this was (1) how we would clearly indicate conversion time analyses across two steps vs. multiple steps, and (2) how linking between graph types would look like (going from steps to time conversion and vice versa).

Let's say we start from a multi-funnel step graph and click the "1 hour" time window to view the time conversion distribution between two steps. We're taken to a time conversion graph with two steps, but the question then becomes, should we change the global filters to only show those two action steps? (Imo it doesn't make sense to alter these global filters b/c we don't do this anywhere else). If we don't want to remove the other irrelevant action filters, how do we keep the data in the histogram consistent with the displayed filters? Since time conversion filters will always be some subset of funnel steps filters, I feel like it makes sense to visually highlight which filters are effectively being applied in the displayed graph.

Maybe color isn't the right approach here, but this kind of highlighting/marking makes the relationship between multi-step funnel steps and two-step time conversions derived from the former much clearer.

I think this also gives us flexibility with how we navigate from funnel to histogram and back. Clicking "time window" from a multi-step funnel will take you to a 2-step histogram. This 2-step histogram can have a "back to source funnel" button that will take you right back. If a user modifies the 2-step filters (either updates or adds a new step), the above highlights can be cleared, the back to source funnel button disappears, and time conversion can act as its own modular insight view. There are probably other edge cases that I'm forgetting about here, but I just wanted to put some thoughts down first and gather some feedback! I've word vomitted a bit here, so let me know if you'd like me to clarify anything here.

paolodamico commented 3 years ago

Great starting point!

Re navigation. I think switching back with the Graph Type selector is a good option. We could also consider adding a link in the main histogram visualization that takes you back to the conversion rates. Something like "View conversion rates".
Re setting the steps to compare. I think @clarkus could help us out here. Definitely an option to do some highlighting in the left-hand side components. Another option that comes to mind is something like this (just a very quick mockup), but would wait for @clarkus's input.

Twixes commented 3 years ago

Hey all! It just happens that I've written this feature's ClickHouse query (#4947).

It currently has a couple of constraints:

only works with ordered steps
can only calculate conversion to one step from the one immediately preceding it (but which one step is selectable as to_step)

These don't have to be lifted to get the feature into our users' hands though, and it's pretty obvious what lifting them would look like.

A question mark is binning though, specifically: the number of bins and bin interval. As it stands now, the query accepts how many bins should be returned (number_of_bins) and always returns that number of them. Each bin has an equal interval, which is calculated by finding the maximum time to convert of all funnel runs being considered (max(step_{to_step}_conversion_time)) and dividing that time by number_of_bins to get bin_base_seconds – that is the interval. The first bin always starts at 0 seconds to convert and includes all runs that reached step to_step from the preceding one in less than bin_base_seconds. Then the second bin includes all runs that reached the step in at least bin_base_seconds but less than 2 * bin_base_seconds. And so on. This way each time to convert graph ends up including all runs, in equal interval bin.

However, what if we wanted to provide an "auto" number of bins, as in the screenshot in the top post – what algorithm would be the best to determine this?

alexkim205 commented 3 years ago

Thanks @Twixes for the backend context!

Some thoughts on those constraints

@paolodamico do you think we should keep the option for unordered steps? I remember in a user interview it caused some confusion so constraint 1 might not matter all that much for now.
I think we can work around this constraint easily. If we select a n range of steps >2, we may have to query the API n times. Not super optimal performance-wise, but something we can rollback once this backend constraint's lifted.

However, what if we wanted to provide an "auto" number of bins, as in the screenshot in the top post – what algorithm would be the best to determine this?

It seems that the Freedman–Diaconis rule works well in practice to determine the optimal number of bins.

Re nav - The sounds good I like that!

Re setting the steps to compare. I think @clarkus could help us out here. Definitely an option to do some highlighting in the left-hand side components. Another option that comes to mind is something like this (just a very quick mockup), but would wait for @clarkus's input.

Love these step selections @paolodamico. Reminded me of how much I like Github's PR commit selector. It's kind of a different use case, but I like being able to see a bit more detail about the steps I'm choosing between. @clarkus

clarkus commented 3 years ago

I'm working on some ideas for this. Thanks for setting the problem and providing some initial direction.

clarkus commented 3 years ago

Here's where I landed with the initial idea. I need to learn more about how we want to describe each bucket on hover, and what other information we could use to make step comparisons more identifiable. Note that this work adheres to the constraints in @Twixes comment above:

can only calculate conversion to one step from the one immediately preceding it (but which one step is selectable as to_step)

alexkim205 commented 3 years ago

This is really awesome, thank you for taking the time to create this!

paolodamico commented 3 years ago

@paolodamico do you think we should keep the option for unordered steps? I remember in a user interview it caused some confusion so constraint 1 might not matter all that much for now.

I think it's definitely worth keeping this option in a general sense (i.e. build unordered funnels), as we also have feedback of this being valuable for other users. However, IMO it's fine to not support the option of doing this deep-dive time to convert analysis on unordered steps funnels for now. We can wait to get more input on this (both in terms of quant usage and feedback).

I think we can work around this constraint easily. If we select a n range of steps >2, we may have to query the API n times. Not super optimal performance-wise, but something we can rollback once this backend constraint's lifted.

My 2c is that we can just offer the analysis for time to convert between consecutive steps (and ideally overall conversion funnel). Analyzing time to convert between non-consecutive, non-holistic steps seems more of an edge case, and besides there's the workaround of removing steps from the funnel. I'm not sure if the backend constraint is that we couldn't do entire funnel analysis ? I'm also not sure doing separate requests would work as you can't simply sum each value.

Re @clarkus's design. Love it, looks amazing! Only feedback point.

I would look into (check if it makes sense, might be too crowded) moving the controls (screenshot below) outside of the graph area. The steps to compare is a full graph control, intuitively it should go on the LHS box. The group outliers and bin size are display options, intuitively (mental model) they would go next to the time range selector.

Twixes commented 3 years ago

Backend-wise implementing total conversion time would be (as far as I see) the same task as implementing conversion between non-consecutive steps, so maybe it will happen anyway (issue https://github.com/PostHog/posthog/issues/4992), will see.

One more question: what would "Group outliers" do? Currently this is not a feature of the query.

paolodamico commented 3 years ago

Makes sense @Twixes, hopefully we can get both! The group outliers is intended to aid with visualisation when you have outliers, but obviously feel free to change it if you think it makes sense. The problem we’re trying to solve is imagine that all but a couple users convert within 1 day, yet there’s a couple of users who converts in 4,5,6,7 days. With normal binning, you would now get 4 extra bins just to represent 4 users (which in turn reduces granularity where the bulk of users lie). If you group outliers, you can have a >= 4 days bin that groups all those outliers, and you have more detail in the bulk of the users, instead of on the outliers. Makes sense? Not sure if the rule @alexkim proposed already accounts for this though.

clarkus commented 3 years ago

Re @clarkus's design. Love it, looks amazing! Only feedback point.

1. I would **look into** (check if it makes sense, might be too crowded) moving the controls (screenshot below) outside of the graph area. The steps to compare is a full graph control, intuitively it should go on the LHS box. The group outliers and bin size are display options, intuitively (mental model) they would go next to the time range selector.

Sure I'll put more time into the placement of those items. I don't think they're all going to fit into the header. I was generally trying to group things based on how they impact the data - adding time would potentially increase the amount of data being displayed. Changing the number of buckets doesn't really change the data, just how you opting to consume it. I'll post something here later today. 👍

clarkus commented 3 years ago

OK while I was able to make all those options fit into the same area, I do think this is a sub-optimal display of these options. We're at the point where we should consider expanding the size of that graph area or possibly de-couple the options somehow via an overflow menu or some secondary options panel. All that said, let me know what you think about these changes. While they're learnable, I don't think they're immediately obvious. Also, custom icon alert - let me know what you think about that group outliers icon I cobbled together.

clarkus commented 3 years ago

Here's an updated design the represents the full page and the placement of the steps control. https://www.figma.com/file/9yWtngNb1AIuf6KmXaEPJA/App-doodles?node-id=568%3A6

paolodamico commented 3 years ago

Really sorry to go back on this but I think the steps to convert might actually be better within the graph space, as seeing it on the side seems like it might just be too easy to miss. If you check the public prototype I actually updated it with a proposed design for this.

alexkim205 commented 3 years ago

I'm going to keep track of all related issues here

Checklist

Done

[X] A custom (and reusable ♻️) D3 histogram component that is used in our time conversion display. #5094
[x] Empty states #5138

In Review

[x] All steps #5142 (related to #5110)
[x] Need to move d3 logic into kea (done in #5142)

In Progress

[ ] Bin counts and autobinning (related to #4995)

Backlog

[ ] Conversion window

Blocked

[ ] Group outliers (In discussion #5126)
[x] Accuracy of data partially blocked on #5116.
[ ] Unordered steps (I think this is still blocked on backend dev)

Items

Testing
CH vs Postgres

paolodamico commented 3 years ago

I think this can be closed now @alexkim205 ?

alexkim205 commented 3 years ago

Yessir, closing this now

PostHog / posthog