HTTPArchive / custom-metrics

Custom metrics to use with WebPageTest agents
Apache License 2.0
19 stars 22 forks source link

Improve coverage of Speculation Rules custom metrics #111

Open rviscomi opened 6 months ago

rviscomi commented 6 months ago

In performance.js we're capturing the speculation rules themselves. This tells us how the links should be prefetched/prerendered, but not which ones actually are.

Add new metadata to the custom metric that captures which requests are actually speculatively loaded. We can look for the Sec-Purpose header and log which requests are prefetched or prerendered.

The custom metric also currently only looks for script tags with the Speculation Rules JSON. But it's also possible to configure the rules via the Speculation-Rules header. The custom metric should check there as well as the script tags.

tunetheweb commented 6 months ago

In performance.js we're capturing the speculation rules themselves. This tells us how the links should be prefetched/prerendered, but not which ones actually are.

Add new metadata to the custom metric that captures which requests are actually speculatively loaded. We can look for the Sec-Purpose header and log which requests are prefetched or prerendered.

This doesn't make sense in a lab environment for the following reasons:

  1. Speculations only apply to navigations not sub-resources so will only affect future navigations and not this one.
  2. There is no API for the calling page to measure if a speculation is triggered (though I thought I recalled someone asking for this but can't find it now).
  3. Speculations are disabled in automated testing envs, and attempts to get them to run on WebPageTest have failed, probably for that reason.

The custom metric also currently only looks for script tags with the Speculation Rules JSON. But it's also possible to configure the rules via the Speculation-Rules header. The custom metric should check there as well as the script tags.

This is one we should do!

rviscomi commented 6 months ago

Wait so if I'm on page A and the rules say to prerender page B, there will be nothing in the network log for page A to say that page B was prerendered? I would have expected to see something in $WPT_REQUESTS with the request to prerender page B.

tunetheweb commented 6 months ago

Well 1) it's not supported on WebPageTest.

Even if it was I'm not sure if it would be observable as it happens in a separate rendering process (you can't see it in the Network tab of DevTools for example - though you can switch to the Network tab of that rendering process in DevTools). WebPageTest may see this as think it looks at a lower level but, as I say, prerender is not supported.

Prefetch does show as a request for this page, so may show however so could monitor those if they work (not sure if they do if also restricted by the same restrictions as prerender).

Also <link rel=prefetch> and <link rel=prerender also prefetch. Just checked and looks like they use the older Purpose: prefetch but not Sec-Purpose: prefetch.

rviscomi commented 6 months ago

Ok so if prefetches show up, and older link[rel] speculations also show up, then it seems like we should have enough data to collect?

Testing prerendering in a lab environment seems like a valid use case, so maybe it's safe to assume that it'll be supported in WPT at some point?

Maybe @pmeenan can clarify how observable those prerenders would be in $WPT_REQUESTS

tunetheweb commented 6 months ago

Testing prerendering in a lab environment seems like a valid use case, so maybe it's safe to assume that it'll be supported in WPT at some point?

I think it was in the past (I have a video here I generated with it), but then it caused problems in puppeteer and so we blocked it (bug) and I suspect that's what turned it off for WebPageTest.

I'm also not sure we should be spending our resource on this during the crawl if it ever is re-enabled. Like if a page prerenders 10 pages, then we've 10x-ed our crawl time for that page! Well less as should reuse cached resources which are common across the pages but still...

rviscomi commented 6 months ago

So what I was originally hoping for with this data is to see how many pages would be prerendered if a real user visited the page. The new ruleset options have patterns that obscure the number of affected pages. WDYT if we faked it by trying to apply the rules to the known URLs on the page? For example, for /* except /logout would it work if we count the distinct number of same-origin URLs on the page and omit the logout page?

tunetheweb commented 6 months ago

You'd also need to take into account the eagerness setting. And the chrome limits.

rviscomi commented 6 months ago

Exactly so the heuristics can only get us an upper limit, which is ok right?

tunetheweb commented 6 months ago

Yeah but maybe a waaaaaay upper limit so not sure what it tells us?

For example, if wikipedia implemented this with document rules and set eagerness to moderate or conservative and you set it able to prerender ALL links on the page then it could say we'll prerender up to 4,816 links on Obama's wikipedia page:

Array.from(document.querySelectorAll('a')).filter(a => a.href.startsWith("https://en.wikipedia.org"));

But it won't do that unless a user hovers over every, single link on that page, which is pretty damn unlikely.

So not sure what this is telling us?

rviscomi commented 6 months ago

Right, we'll still have the context of the eagerness setting, so this will tell us how many links are at best eligible for preloading. If you're saying this data would NEVER be useful, sure, let's skip it. I'm also open to other suggestions for ways to measure API usage.

rviscomi commented 3 weeks ago

Picking this back up after discussing offline. There would be value in knowing if a page eagerly prerenders the maximum number of URLs (10 in Chrome). Assigning to myself to prototype a parser to count the number of eligible URLs according to the rules.