ScottTravisHartley commented 3 years ago

A generalized question and discussion around a common and rising way of easily improving scores. There are both third-party services (and plugins for various cms') that offer the ability to delay JavaScript execution these services typically what they do is rewrite the javascript resource to have a different resource type or a different source attribute. Then with additional JavaScript after a user interaction (tap, scroll, etc) the JavaScript source or type tag is rewritten and run.

My question here is since lighthouse has no way of measuring the impact of these resources and these services are effectively just a way to game the system what is the logical impact when Core Web Vitals become a ranking factor. I am going to assume Google will rely on Crux or similar data as opposed to lighthouse lab reports for determining a website's performance but at the same time, services such as these are becoming more and more common (while not being cheap). Should lighthouse in its testing fake a "tap" user interaction to root out services such as these to give a more accurate measurement or is it simply a cat and mouse game?

On another note though, delaying certain JavaScript files until the interaction is a great way to improve performance for instance say you're loading a Facebook page widget in the footer of your site and you set it up to only load the JS when it starts to come into the viewport (like you would lazy load an image). The issue seems to be more for sites who simply delay all JavaScript until interaction which many of these third-party services do and those sites are painfully obvious when you look at the markup and run a lighthouse report and notice there is virtually no JavaScript being run when you can simply check your chrome network tab and find x amount of files downloaded.

pedddro commented 3 years ago

I believe this should be a priority. I haven't found whats the impact on it for Core Web Vitals, as you can easily manipulate the elements with JS to appears only after user interaction. I'm not an expert but a plugin that flags this behavior can be a good start, whereas this can be a large problem to be discussed.

ScottTravisHartley commented 3 years ago

Well, the generalized impact is this. If the JavaScript is simply delayed until after interaction then it doesn't get measured from lighthouse and by extension, I am willing to assume it doesn't get measured in core web vitals. Because core webvitals from my understanding are hard measurements measured by the browser much in the same way the onload event time is calculated.

The issue though is you're seeing services guarantee xyz score by simply delaying all the JS until his point the problem here becomes a couple of things.

It gives the site owners a false sense of progress loading half a MB of JS compressed is a lot for a low-end device. You're just moving the problem until after the point of measurement.
Delaying a significant amount of JavaScript until after interaction can have the opposite effect where you have shifted the payload so someone taps the screen and then the browser frees and the TTI shoots back up.
JavaScript manipulation of elements that would otherwise generate a CLS don't get measured because again the JS isn't loaded in.

So my issue boils down to this. If we are making these hard metrics a ranking factor and they can be easily bypassed by simply shifting the load until after the measurements are taken by delaying all JavaScript among other items by which the user interacting with the page (or its set on a timer) is what causes the assets to be loaded in bringing about a worse user experience IMO are we really fixing anything. Because from a user experience POV I would much rather suffer from a long TTI and layout shifting at first rather than later due to some silly workaround.

Because, if this can be seen as a valid method what most people are going to do, and are starting to do is simply just delay everything instead of fixing the real issue and many of them are already through these various third-party services.

connorjclark commented 3 years ago

Rewrite the javascript resource to have a different resource type or a different source attribute.

Can you provide some examples? Is this similar to Cloudfare's Rocket Loader? Are these tools delaying all scripts until a trivial user interaction such as scrolling or document.body mousemove?

Delaying resources will "trick" lab tools like Lighthouse (although this is a valid practice, to defer unnecessary resources until user interaction requires it, as you mentioned w/ your Facebook example). We're working on supporting auditing user flows, although this only applies to the developer that is actively trying to improve performance end-to-end, not just the initial cold load (which is the state of things today with lab tools).

For now, such tactics may not receive appropriate consideration wrt field reporting tools such as CruX and their measurement of real users Core Web Vitals.

For example, FID will be impacted by large, just-in-time resources, but only if that is the first user interaction. There's indeed a missing coverage of subsequent, later interactions.

For CLS, late page shifts are still accounted in field tooling. In fact this is problematic, as it over estimates long living sessions. We are working on tweaking this metric: https://web.dev/better-layout-shift-metric/

LCP generally does not apply (it is generally not gated on user interactions, although just-in-time resources might be a source of contention that slows critical resources required for LCP).

Offhand, I'd expect CWV to receive new metrics to account for the late-page UX, but I may be speaking out of turn and don't have a reference readily available. Between LCP/FID, there is definitely a slant in early-page metrics, and without late-page metrics there is no counterbalance to the gaming you've brought up.

ScottTravisHartley commented 3 years ago

It's slightly different than Cloudflare's tool. Cloudflare's Rocket Loader rewrites the JS so it's skipped and then after the onload event is fired then it immediately processes the JavaScript so in a way they are just adding a "defer attribute" to all JS including inline scripts. These tools are delaying the JS either until interaction or on a timed delay since the test doesn't interact with the page or the test stops running since nothing is being done it thinks the page is finished or it reached the fully loaded time.

One example of it is Ezoic's Site Speed Accelerator tool where all the JS is rewritten though you can exclude certain JS files in the tool manually though it's a bit of work but it is possible that way you only exclude ezoic's JS.

Another example would be nitropack where it too can rewrite all the JS.

And yes as you mention delaying certain unessential JS is totally a good idea some resources are not required immediately advertisements, tracking pixels, etc might not need to be loaded immediately. Also, I imagine it could have an impact on the LCP if the LCP is dependent on JavaScript such as a slider without a proper fallback for an image or something silly that only renders once all of its JavaScript is loaded in. You see this commonly on WordPress sites using Slider Revolution for instance.

ScottTravisHartley commented 3 years ago

And the easiest way users can tell that they aren't necessarily getting an accurate measurement is under the resources tab in lighthouse or on page speed insights typically the JavaScript section will be a minuscule amount compared to the actual package size that the site sends down the wire though they only come in after the interaction or the timer is run. Whereas cloudflare all those resources do appear and are properly measured

pedddro commented 3 years ago

I have analyzed +50 Wordpress websites that use those techniques and none of them passed in all 3 requirements on Field Data of Core Web Vitals(mobile and desktop), even after passing on Pagespeed Lab Data and live testing with web vitals extension. I have analyzed websites that were optimized older than 2 months, so the results take into consideration the recent website structure, but the data could be still inaccurate. Most websites lack Field Data, but this is a pre-research into this.

It preliminary shows that using this technique could only net you a short-term benefit, due to how real data is totally different from Pagespeed Lab Data and how CRux data is also different for Core Web Vitals.

Another fact is that a lot of those websites scored +90 on both mobile and desktop. Yet, it is still unclear if Core Web Vitals is more important than Pagespeed scoring.

paulirish commented 3 years ago

This defer-JS-until-interaction pattern is great. Addy's writeup: https://addyosmani.com/blog/import-on-interaction/

We think folks should adopt it.

Admittedly, if that happens, it does roughly mean that they're gaming Lighthouse, but we're okay with this.

11313 provides an ability for people to get metrics while simulating user interaction, so you can do it yourself. But we do not have plans to automate clicks/scrolls for all Lighthouse runs by default.

Yet, it is still unclear if Core Web Vitals is more important than Pagespeed scoring.

Google Search uses field CWV data: https://support.google.com/webmasters/thread/104436075/core-web-vitals-page-experience-faqs-updated-march-2021

ScottTravisHartley commented 3 years ago

Sounds interesting that we can simulate interaction, the trend I am seeing now is a simple delaying of all JavaScript and files particularly in the WordPress sphere, being adopted by quite a few popular plugins though if this is the way to go it makes it much easier lol.

one of my hobby sites (linked below) where i installed a popular WordPress plugin and delayed all JS until interaction this is more what I was referring to as opposed to loading video players on interaction. Loading a player on click makes tons of sense as does doing it for chat widgets this I have done before many of times.

This specific issue was more tied to taking that idea and taking it to the extreme of simply as i have in the example link below simply taking all JS and not loading any of it until after any sort of interaction. So in short, no JavaScript is getting downloaded/executed (except the inlined library for this functionality, and the lazy load script), and then the rest of the JS is loaded immediately after any sort of interaction.

example: https://dailydrivertips.com/

pedddro commented 3 years ago

People don't only game the Lighthouse but already gaming Core Web Vitals Lab Data. The issue is that if not a more real user interaction approach is used for the testing, and maybe the metric, people will always game the test, and all the lab data will be rendered useless. And maybe in future even game field data, as let's say if I lazyload the javascript from a dynamic LCP element(blurred placeholder tech. for example), and also game the other metrics FID and CLS using the same logic, until it returns 'good' field data metric. I'm not sure that is possible with field data.

I still think a Lighthouse plugin that flags on avoiding this behavior(delays all js in favor of scoring) could be a beneficial start on this issue.

pedddro commented 1 year ago

Whit INTP now available on LH Timespan -maybe soon to become official- and Chrome efforts to reduce metric manipulation, this issue can be closed(two years early).

ScottTravisHartley commented 11 months ago

Good to see a more accurate metric was introduced.

GoogleChrome / lighthouse

Loading JavaScript After User Interaction #11904

11313 provides an ability for people to get metrics while simulating user interaction, so you can do it yourself. But we do not have plans to automate clicks/scrolls for all Lighthouse runs by default.