GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.39k stars 9.38k forks source link

Adjust for faster performance of Silicon Macs by having a higher CPU slowdown multiplier (either calculated on-the-fly based on benchmarks, or hardcoded) #15068

Open vipulnaik opened 1 year ago

vipulnaik commented 1 year ago

Feature request summary

(Data below feature request summary).

Running Lighthouse scoring (whether through Chrome DevTools or Node) on Silicon Macs (M1 or M2) yields much higher Lighthouse scores than what we see on PageSpeed Insights or on older machines, such as Intel Macs. Most of the difference seems to stem from the Total Blocking Time (TBT) score component, which is heavily dependent on how fast the CPU is on the machine where Lighthouse is being run.

Currently, the CPU slowdown multiplier is set to a hardcoded value of 4 (though the effective multiplier for layout tasks is just 2 instead of 4). In order for Silicon Macs to yield comparable TBT scores to pagespeed.web.dev or to Intel Macs, the slowdown multiplier probably needs to be somewhere between 10 and 40.

Those running Lighthouse through Node have the option of setting the CPU slowdown multiplier using --throttling.cpuSlowdownMultiplier=<multiplier>. But many people likely just run Lighthouse using Chrome DevTools, where the CPU slowdown multiplier cannot be edited (as far as I can make out). Scores that are very different between their local Lighthouse runs and the Lighthouse results on pagespeed.web.dev can confuse them.

Data analysis

I ran an analysis on the url https://web.dev/top-cwv-2023/ (chosen as a relatively simple web page that's still big enough to fall short of Lighthouse perfection; also one that developers on this project may be familiar with). I did runs using Node v18.15.0 on a M1 Macbook Air 2020, an Intel Macbook Air 2018, and pagespeed.web.dev. For the local runs, I tried a variety of values of the CPU slowdown multiplier. I saved my analysis to https://docs.google.com/spreadsheets/d/1ytVVzRGC6PfTGGAA82EcOlyF76hivzHEAzaJZ9sSdiM/edit (viewable through the link by anybody).

The pagespeed.web.dev runs are linked so you should be able to see the reports. I also have the files of my local runs saved for the time being, so can retrieve more information about each run if needed, though I think the trends are clear.

For the pagespeed.web.dev (with the default CPU slowdown multiplier of 4), the TBT measurement varied between 120 ms and 550 ms. This was a little bit better than my local run on the Intel Macbook Air with a CPU slowdown multiplier of 4, and reasonably close to my local run on the Intel Macbook Air with a CPU slowdown multiplier of 2.

On the other hand, for the M1 Macbook Air, the range of 120-550 ms in TBT is seen with a CPU slowdown multiplier in the range of 12 to 32. There may be some special things going on with this page, but I have observed generally similar patterns for some other pages I tested. But I suspect that something in the range of 10 to 40 would be best to match PageSpeed Insights UI.

I'm not sure whether hardcoding a value is best, or if it's better to calculate the value in DevTools by running a speed benchmark right before or during the Lighthouse evaluation. The latter has the advantage that it doesn't require any specific hardcoding for Silicon Macs, it can adjust to speed issues affecting specific computers, and it requires less adjustment as new devices and operating systems are released. But it's not clear if there is a reliable-enough set of speed benchmarks that can be calculated quickly enough.

What is the motivation or use case for changing this?

This is mostly for the moderately sophisticated Lighthouse user who hasn't set it up to run with Node and is testing the Lighthouse in Chrome DevTools.

Here is one example where this could be confusing. A developer on a Silicon Mac makes a code change to a website's codebase locally and runs Lighthouse locally in devtools and sees a much better Lighthouse score than had been previously reported on PageSpeed Insights for the corresponding live change. The local change cannot be tested directly on PageSpeed Insights because it's not a public url; the developer doesn't think to test the production url in the Chrome DevTools Lighthouse to benchmark. So the developer sees an apparent huge improvement in Lighthouse score and ships the change. But then there is no improvement in the PageSpeed Insights API when the url is tested after shipping.

It's also worth noting that prior to Silicon Macs, local Lighthouse testing generally gave slightly worse results than the PageSpeed Insights API (similar to what I saw with the Intel Macbook Air 2018). So the developer may have a prior belief that local results are worse than PageSpeed Insights API results, based on past memory of experience working with older devices. So even though the user might understand the local/PageSpeed Insights distinction, the direction of discrepancy maybe opposite to what the user expects.

How is this beneficial to Lighthouse?

This makes the Chrome DevTools version of Lighthouse more usable for developers, product managers, QA testers, etc. who are using Silicon Macs. It is especially relevant for testing local code changes or changes on QA environments that are not yet public.

A more general way of benchmarking CPU speed can also be helpful for automated testing using Node Lighthouse; basically, the test results should ideally be more hardware-independent, so that developers can interpret Lighthouse scores between different devices, whether local or in the cloud.

connorjclark commented 1 year ago

Thanks for the issue! I'm actively researching how to apply an adaptive CPU slowdown multiplier (as opposed to our hardcoded default of 4x, no matter the developer device). Stay tuned.

u3u commented 1 year ago

On my machine (Apple M1 Max 64 GB), --throttling.cpuSlowdownMultiplier needs to be set to 100x slowdown (Simulated) in order to see similar performance scores as PageSpeed Insights 😓