catchpoint / WebPageTest

Official repository for WebPageTest
Other
3.08k stars 715 forks source link

Add a metric for interactivity #781

Closed pmeenan closed 2 years ago

pmeenan commented 7 years ago

There are an increasing number of cases where pages render reasonably quickly (yay Speed Index) but the page is unusable because the javascript framework is being hooked up and blocks the main thread for multiple seconds.

Lighthouse has a Time to Interactive audit that grades for it and WebPageTest needs some way to also account for it.

My current thinking is to use the visual progress and Speed Index as a base and modify it based on how interactive a page is.

Specifically:

Benefits:

Downsides:

Visual Examples

WebPageTest

This test is of the WebPageTest main page (mostly static) on a Moto G.

From the filmstrip we can see the visual progress used for the Speed Index calculation: wpt1

Speed Index is calculated as the area above the visual progress, resulting in a Speed Index of 2458: wpt2

If we look at the main thread browser activity (below the waterfall) and identify the periods where the main thread can be considered "interactive" (currently defined as a 500ms window with no tasks that block the main thread for more than 50ms) we get something that looks like this: wpt3

Then, we take the visual progress and only count areas in the graph where the main thread is also interactive and the other areas count as 0% progress, we re-calculate the area over the graph and get something closer to 4450: wpt4

CNN

A much more extreme case where the content is ready pretty fast visually but there is a TON of script blocking the page activity is CNN.

In this case most of the visual content loads around the 12 second mark and produces a Speed Index of 12235: cnn1

If We look at the main thread activity though, there are ~ 6 seconds of interactivity spread out after the initial render until you get to the 50 second mark where the tasks don't block the main thread to block interactivity: cnn2

In this case, the modified area above the graph would be closer to 44000 and better represent how poor the experience is.

stevesouders commented 7 years ago

Thanks for sharing this. Fun!

=== 1. I think the goal is good:

There are an increasing number of cases where pages render reasonably quickly (yay Speed Index) but the page is unusable because the javascript framework is being hooked up and blocks the main thread for multiple seconds.

I get it. So you want a new metric that merges visual completion AND main thread availability. But regarding this quote: How did you determine that the page is unusable? Is it anecdotal?

=== 2. Interactive is hard to define:

interactive being defined as a time window of > 500ms where no task blocks the main thread for more than 50ms

Is "task" the main categories of scripting, painting, layout, loading? If so, that makes sense. How many total "tasks" are there? I assume it's less than 10 so even if all of them are right at 50ms that still leaves at least a 50ms non-blocking window in the 500ms.

=== 3. You say:

Speed Index is the floor

Seems like SI is the BEST score possible, and the score can only get worse when you factor in the main thread. So I guess it's true that SI is the "floor" in the sense that the score will be greater than or equal to SI, but "floor" usually means "worse" whereas here it's "best". I might say "Speed Index is the best score possible. The new metric's value will only get worse as lack of interactivity is factored in."

=== 4. The first example, of WPT itself, doesn't seem logical. Below I show the main thread with SI and ISI indicated. It "feels" like if SI happened at 2450ms, and then there was a lot of blocking, ISI should be around 3200ms (after SI when there's a significant free up of the main thread). What I'm suffering from is my desire to have a "point in time" like "Time to Interactive". But what you're calculating is sometime similar to SI which captures the visual & interactive behavior across the ENTIRE page load. So the metric is larger than 3200ms because of the big non-interactives at 3800ms and 5700ms. I just don't know if I feel the ISI should be penalized by these later non-interactive times. Would the user have really noticed them? When do you stop?

image

=== 5. Is it really best to merge these two metrics - visual progress and interactivity? If you combine too many metrics, the number stops having any relevance to the real world, hides important information, and is hard for humans to interpret. What if SI and "main thread blocking" were separate metrics? For CNN, SI is 12 seconds and Interactive is 55 seconds. That tells me that my problem isn't rendering - it's JS. So I actually know more quickly what I should focus on.

=== 6. I think a BIGGER problem is that pages do NOT render quickly because they're blocked by 1) critical blocking resources that download slowly or 2) JS execution that blocks the main thread. To further complicate the situation, the JS execution may come from a synchronous script or an async/defer script. Developers need help figuring out:

There's a yin-yang here of wanting metrics that indicate there's a problem, and wanting other metrics that help developers pinpoint the problem. Maybe SI and some new "interactive" metric help indicate if there's a problem that the USER notices - lack of visual progress or inability interacting with the page. Then we might want some other new metrics like:

zeman commented 7 years ago

Slight tangent...

I love the intent of this, giving people more useful metrics, but in that spirit I'd love to see a user timing mark with a standard name (content_interactive?), picked up and elevated in the WPT UI to the level of the standard metrics.

Then we can really champion sites adding a meaningful user timing mark when the app thinks it's ready for user interaction.

pmeenan commented 7 years ago

Maybe it's a bit early to be mentally merging Speed Index in with Interactivity which could make it harder to explain and "get there". First step should probably be just around Interactivity and point-in-time vs some form of aggregate.

What I'm basically proposing is a metric that measures "the amount of time where the user could not interact with the page". If they tried to click on something (or possibly even scroll) it wouldn't respond for a while, if at all. It also doesn't include loading activity, purely the browser's main thread being blocked and unable to respond to input.

Functionally that is the amount of time the page was not interactive added to the start render time (can't interact if nothing is on the screen).

Looking at this Airbnb page as an example (and the new interactivity indicators help visualize it):

image

The content is rendered at ~8.5 seconds but the main thread is locked up for another 1.5 seconds. Then there is a small window of ~1.5 seconds where if you did something the page would respond before it goes out to lunch for another 4 seconds.

If you look, there is also a block from 16-18 seconds where there is a lot of script activity but the page remains interactive.

An "Interactive Index" would basically add up the time in all of the red blocks and add it to the start render time (producing something like 20,000 in this case). Pulling a "point in time" is difficult when there are gaps of being interactive and not. Would you mark the first block (11 seconds), the final block (32 seconds) or somewhere in between?

As far as when to end, I'm assuming that pages eventually become well behaved and turn mostly green/interactive after the activity settles down at which point even small slivers of blocked thread won't measurably affect the outcome. In WPT this may get capped at just "whatever the test end time was" but it would be nice to be able to extend the tests as long as there was either network activity OR the main thread was heavily active (not possible as far as I know with Chrome tracing but worth looking into).

pmeenan commented 7 years ago

@zeman WPT currently exposes a "User Time" as a top level metric which is the last user timing mark from the page if there was one. If we can agree on a convention I'd be happy to have it overridden with a specific marker if the marker was present (i.e. default to the last mark but use "content_interactive" or whatever we agree to if it is present).

stevesouders commented 7 years ago

I really like a "point in time" value. In your previous comment you mentioned getting some measure of interactivity and adding it to the start render time. A metric that provided a point in time where (significant?) content is visible AND the page is interactive would be good. How about something like "the first point in time after start render where the page is interactive for at least N seconds".

We'd have to run experiments to determine the value of N but it's probably 1 or 2 seconds.

Instead of the first point after start render, it could be the first point after Speed Index. That would be a better approximation of "significant" content being visible.

pmeenan commented 7 years ago

That is pretty much what Time to Interactive is doing in lighthouse and I'll be adding that as a metric to WPT as well once the definition stabilizes. I believe they are currently looking for a window of 5 seconds of interactivity which in the Airbnb case above would be somewhere north of 34 seconds and in the WPT example would be ~6.5 seconds.

Maybe we start there and evolve but I'm fairly certain the interactive point-in-time measurements are going to have the same issue as render point-in-time measurements.

nilskuhn commented 7 years ago

Great discussion, thanks for investigating this!

We would very much appreciate to have a time to interactive in webpagetest results.

I would prefer first point after SI with interactivity for N secs instead of first point after start render. First, because it could happen that interactive time window occurs before enough content is visible on the page for a user to interact with. And second start render isn't very reliable with multistep tests at the moment. We are thinking about cleaning the viewport between steps to make start render more reliable for follow up steps but I'm not quite happy with doing too much stuff in our tests that wouldn't happen in real customer journeys.

addyosmani commented 7 years ago

That is pretty much what Time to Interactive is doing in lighthouse and I'll be adding that as a metric to WPT as well once the definition stabilizes. I believe they are currently looking for a window of 5 seconds of interactivity which in the Airbnb case above would be somewhere north of 34 seconds and in the WPT example would be ~6.5 seconds.

The Lighthouse definition for TTI is currently the moment after DOMContentLoaded where the main thread is available enough to handle user input. It looks for the first 500ms window where estimated input latency is <50ms at the ~90% percentile.

I've been doing some larger tests comparing some of the other TTI models we've been exploring (shared with @pmeenan) to the Lighthouse definition and in practice, they're currently not a million miles off. I'd welcome (if feasible) a version of TTI being baked into WebPageTest - even if just exposed in report CSVs - to help us better understand how well our current models hold up on different classes of mobile devices.

Imo, the ideal form of TTI (for me) would be an iteration on the Hero Element Timing API, where you wait for your annotated elements to becoming interactive and base the TTI score on that. Perhaps something we can explore further down the road.

pmeenan commented 7 years ago

I added a doc with how the lighthouse TTI measurement is going to be calculated in WPT based on various discussions, at least for the initial work (I'm assuming there will be some refinement).

sburnicki commented 7 years ago

I saw from your commits that TTI is already implemented, so I wanted to test it today. I didn't see TTI in the test result, though (Example).

What are the prerequisites for a test so TTI is calculated? I used chrome with video capturing, timeline and trace (with default options).

pmeenan commented 7 years ago

Should just need timeline and video (not sure video should really be needed) but it will only report for cases where there is at least 5 seconds of "interactive" time. Not sure what happened with the timeline case you captured but the main-thread timing wasn't in the waterfall (which is what it uses).

In some cases if the page is too busy right up until the end it won't report. You can force it by using a "minimum test duration" but at some point there may need to be a knob to record timeline data for 5+ seconds after a test would normally terminate.

Here is an example for cnn: http://www.webpagetest.org/result/170210_HE_ae3c3c1569c1f78f9deed2bd5f85eea0/

jaroslawrosiek commented 6 years ago

FYI: https://docs.google.com/document/d/1GGiI9-7KeY3TPqS3YT271upUVimo-XiL5mwWorDUD4c/edit#

github-actions[bot] commented 2 years ago

We're in the process of cleaning up issues on this project in order to ensure we're able to stay on top of high priority bugs and feature requests. As a part of this process, we're automatically closing any issues that have had no activity within the last two years, including this one, since the codebase has changed dramatically in that time. If you feel this is still relevant, please file a new issue using the relevant issue template so we can get it prioritized. Thanks!