GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.32k stars 9.36k forks source link

High TTFB causes implausible LCP and negative LCP subpart #16213

Open brendankenny opened 1 week ago

brendankenny commented 1 week ago

Example reported on twitter, not a public URL, however. The repro page has a high TTFB, but then the LCP element (an h1) is found statically within the html response.

Two obvious issues:

Finally, if you can access the full report, there are inconsistencies that pop up in other ways, like LCP subpart breakdowns with TTFB over 100% and render delay with a negative percentage

Snippet from a Lighthouse report showing TTFB taking 204% of LCP and Render Delay taking -104%

Issues

The fundamental problem appears to be the high TTFB for the initial request that gets diluted or unused in other calculations. I haven't verified the reason, but it could be something like the initial HTML response is slow because it requires SSR, while the remaining requests are for static resources.

The issue can also be more subtle when the TTFB is less high, but still significantly higher than the server response time for any other resources on the page. TTFB won't necessarily be higher than the simulated LCP, but it will still cause at least the LCP breakdown to be misleading.

At least two separate fixes:

Guard against impossible LCP breakdowns

First, the largest-contentful-paint-element audit's TTFB estimate is the max of an estimated TTFB (similar to the lantern simulated value) and the observed TTFB. For this case, it ends up using the (large) observed TTFB.

When there's an LCP resource (image), the LCPBreakdown computed artifact guards against impossible values by clamping against TTFB, so while the percentages might not reflect reality, they're at least all between 0 and 100 and add up to 100.

When there's not a resource (page has a text LCP), LCPBreakdown returns only the TTFB time and calculates render delay from the simulated LCP minus TTFB. Since the simulated LCP is much smaller than the observed TTFB, this ends up negative, and the percentages end up as above.

The text LCP case needs a similar clamping against TTFB, so at least the values aren't obviously in error.

However, the millisecond values can still add up to much more than the listed LCP until that's fixed in Lantern (the percentages will likewise be unrealistic).

Improve Lantern simulation

Lantern chooses the server latencies to use during simulation by looking at the server response times from all requests and then taking the medians per origin. This is intended to even out outliers and give a reasonable estimate of the server's responsiveness even in the face of noisy reality. However, this isn't a good model if the sever has fundamentally different responsiveness for different resource types.

In runs I did of the repro site, the first request would have a 3.5-5s TTFB—the vast majority of observed LCP—but the remaining ten requests were very fast and so the simulator ended up using the median ~50ms for the estimated server latency for all requests, leading to the incredibly fast simulated LCP.

Lantern could use the max of (median server response time, the request's actual response time), but that could remove a significant part of the effort to smooth out real-life variability. Other options might be to further break down the SRT estimate by resource type, special case the main document response, or create a smooth estimator parameterized by the request's actual response time. All the options have tradeoffs and any change here will likely degrade other use cases.

adamraine commented 4 days ago

Just splitting out the TTFB calculation for the main document could help with this specific case.

connorjclark commented 4 days ago

However, this isn't a good model if the sever has fundamentally different responsiveness for different resource types.

for most resource types, their content is static and the server can deliver them very quickly with no processing time (no database queries, no user auth, etc). JS, CSS, images, etc. mostly fall under this

it seems that for some types of requests we never want to take an aggregate estimation. for example:

@brendankenny wdyt of us identifying these sort of requests, and always using a n=1 sample size for the server response estimation? pretty much as you suggested w/ resource types, but for each HTML/XHR url we give just its timing as the result, and for everything else it gets average'd out (to estimate the time it take the server to serve a static resource, basically).