GoogleChrome / web-vitals

Essential metrics for a healthy site.
https://web.dev/vitals
Apache License 2.0
7.49k stars 410 forks source link

LCP and CLS report callback are significantly delayed #405

Closed jashsayani closed 9 months ago

jashsayani commented 9 months ago

According to the README:

Also, in some cases a metric callback may never be called:

  • FID and INP are not reported if the user never interacts with the page.
  • CLS, FCP, FID, and LCP are not reported if the page was loaded in the background.

Currently, I see FCP and TTFB being reported immediately when page is loaded, but LCP and CLS are significantly delayed (page is not loaded in background and callback is fired, but its too late). FYI: I am not using reportAllChanges.

My network timeline:

Screenshot 2023-12-01 at 5 31 09 PM

FCP and TTFB happen at 6.8 secs which is reasonable:

Screenshot 2023-12-01 at 5 31 32 PM

LCP and CLS are at 43 seconds!! Users don't stay on a page that long!

Screenshot 2023-12-01 at 5 31 40 PM

Is this a bug? If I setup a PerformanceObserver with type: "layout-shift", the callback happens instantly. Looking at the code, I see it tracks "session" in the 5 sec window or creates a new session, etc. But that should not take 40 secs to report. LCP is also delayed and it just checks visibilityWatcher.firstHiddenTime which should be Infinity as the window is always visible in this case.

tunetheweb commented 9 months ago

This is working as intended.

Both LCP and CLS (and also INP btw) can change throughout the life of the page (1). For example a larger element could still be inserted into the page at any time and become the LCP element and time. Therefore it is not possible to know when the value we have is the "final" value of this metric and since you are not using reportAllChanges we should only report the final value (2). When it does report it will report the largest value up until then - even if it occured much earlier. For LCP the timestamp will have been much earlier. It just delays reporting that time until it's sure that is the final LCP (and similarly for CLS and INP).

This differs to TTFB (which can be finalised and reported as soon as the first bytes are received as it will not change after that), FCP (similarly as soon as first contentful paint happens) and FID (when first interaction happens).

(1) Note for LCP it can be finalised on interaction:

The browser will stop reporting new entries as soon as the user interacts with the page (via a tap, scroll, or keypress), as user interaction often changes what's visible to the user (which is especially true with scrolling).

In these case it is reported earlier:

https://github.com/GoogleChrome/web-vitals/blob/16e6e93764bee949cc7a04250695893fd76502bc/src/onLCP.ts#L99-L107

(2) Other than the above case for LCP when an interaction happens, LCP, CLS, and INP should really only be reported when the user navigates away from the page. The web-vitals.js library does do this, but also reports on other other case, which is when the page visibility changes (i.e. the tab is backgrounded). We do this because there this is the last reliable time to report for a page in certain circumstances (e.g. if the page is never foregrounded again), though this can also lead to duplicate reporting (e.g. if the CLS is reported on backgrounding, the page is foregrounded again, has some more CLS for example, and then you leave, then you will get two CLS reports), but it's the best we can do (FYI, the Chrome team are working on a new API for circumstances like this to only report this once!).

You can see LCP stops listening with the onHidden function (which includes both backgrounding or navigating away, and for LCP also reports the current value): https://github.com/GoogleChrome/web-vitals/blob/16e6e93764bee949cc7a04250695893fd76502bc/src/onLCP.ts#L109 CLS reports (but then continues listening): https://github.com/GoogleChrome/web-vitals/blob/16e6e93764bee949cc7a04250695893fd76502bc/src/onCLS.ts#L113-L116 and INP does similar: https://github.com/GoogleChrome/web-vitals/blob/16e6e93764bee949cc7a04250695893fd76502bc/src/onINP.ts#L227-L238

LCP is also delayed and it just checks visibilityWatcher.firstHiddenTime which should be Infinity as the window is always visible in this case.

LCP has an additional check that ignores pages that were opened in the background (as measuring LCP in this case is tricky, as technically the paint doesn't happen until the user foregrounds the tab but that might be way after the LCP content has actually loaded). But that is unrelated to late reporting (it never reports in this case).

So all in all it is expected to report LCP, CLS, and INP much later.

jashsayani commented 9 months ago

it is not possible to know when the value we have is the "final" value of this metric

Yes, but at some point the value has to be "final". I can have window.setTimeout(renderNewLargeContent, 50000) and change the largest content right after it being reported. I think there should maybe be a "maxTimeout" that I can set of like maybe 15 secs, where the value is reported (whatever it happens to be until that point). If you want to be accurate throughout the lifecycle of the page, then this should not fire/report at 40 secs, but rather on page unload/navigation away.

It just delays reporting that time until it's sure that is the final LCP

You cannot know its the final LCP unless the lifecycle of the page is over (like on unload). Otherwise its just an arbitrary point in time when it says "i guess page won't change now". So it being reported at 40 seconds right now is a bug! Should not report till I navigate away or interact.

(1) Note for LCP it can be finalised on interaction

Yes, I saw this in the implementation but when recording field metrics, it is not great to rely on user interaction to report. If they open the page, then close it in 5 secs, the metric is lost. We are not blocking unload till the reporting call is successful (which would be bad anyway). Also for websites that are not Single Page Apps, clicking on any content can mean navigating away and lost metrics.

LCP, CLS, and INP should really only be reported when the user navigates away from the page

I think having a maxTimeout of 10 or 15 secs and reporting is better as relying on navigating away is not reliable and its basically lost metrics. In any case, the LCP and CLS being reported at 43 secs is a bug?

Tl;dr: (1) What do you think of maxTimeout? (2) What is the recommended approach if I want to have a "final" value at say 15 seconds and not rely on interact/end of lifecycle?

tunetheweb commented 9 months ago

it is not possible to know when the value we have is the "final" value of this metric

Yes, but at some point the value has to be "final". I can have window.setTimeout(renderNewLargeContent, 50000) and change the largest content right after it being reported. I think there should maybe be a "maxTimeout" that I can set of like maybe 15 secs, where the value is reported (whatever it happens to be until that point). If you want to be accurate throughout the lifecycle of the page, then this should not fire/report at 40 secs, but rather on page unload/navigation away.

There is no 40 second timeout. You must have backgrounded the tab at the 40 second mark. That is also why CLS was reported at the same time.

It also seems somewhat contradictory that you are complaining about a max timeout (that does not exist) but then asking for a user-defined max timeout.

It just delays reporting that time until it's sure that is the final LCP

You cannot know its the final LCP unless the lifecycle of the page is over (like on unload). Otherwise its just an arbitrary point in time when it says "i guess page won't change now". So it being reported at 40 seconds right now is a bug! Should not report till I navigate away or interact.

As discussed above the web-vitals library reports metrics on visibility change (including page unload), or sooner if they can be finalised sooner. It is not arbitrary.

(1) Note for LCP it can be finalised on interaction

Yes, I saw this in the implementation but when recording field metrics, it is not great to rely on user interaction to report. If they open the page, then close it in 5 secs, the metric is lost. We are not blocking unload till the reporting call is successful (which would be bad anyway). Also for websites that are not Single Page Apps, clicking on any content can mean navigating away and lost metrics.

No, if they open the page and then close it after 5 seconds, the page visibility will change, and there the beacons will fire. The page will hold navigating away until the event fires. Pages do block navigating on event handlers firing, otherwise there would be no point in having event handlers for these.

LCP, CLS, and INP should really only be reported when the user navigates away from the page

I think having a maxTimeout of 10 or 15 secs and reporting is better as relying on navigating away is not reliable and its basically lost metrics. In any case, the LCP and CLS being reported at 43 secs is a bug?

If it happened like that without backgrounding the page that would certainly be a bug, but I don't believe it does. Please provide a site where that happens.

Tl;dr: (1) What do you think of maxTimeout? (2) What is the recommended approach if I want to have a "final" value at say 15 seconds and not rely on interact/end of lifecycle?

This has been discussed before (for example in #394), but the point of the web-vitals.js library is as detailed in the home page of this project:

for measuring all the Web Vitals metrics on real users, in a way that accurately matches how they're measured by Chrome and reported to other Google tools (e.g. Chrome User Experience Report, Page Speed Insights, Search Console's Speed Report).

So we are against adding such an option because we believe the current methodology are the most accurate ways of measuring the Web Vitals.

jashsayani commented 9 months ago

You must have backgrounded the tab at the 40 second mark

This is possible, if I can repro it reporting, I will file an issue.

complaining about a max timeout (that does not exist) but then asking for a user-defined max timeout

Yeah not sure why I saw it reported at round 40 secs, but I am mainly asking for a timeout.

Pages do block navigating on event handlers firing, otherwise there would be no point in having event handlers for these

Yes, the unloading JS will run, but I am not sure if there is a guarantee of network calls for reporting being fired successfully (with 200 OK response; I think HTTP/2 server would close connection if the client fires request but connection is closed/lost before 200 OK?). There are other issues, for example on mobile, bad connection/network when closing it, etc. Like in the metro/tube, you might lose network and close the page.

we are against adding such an option because we believe the current methodology are the most accurate ways of measuring the Web Vitals

Thanks, I will maybe use reportAllChanges and implement my own timeout. I think that would work.

tunetheweb commented 9 months ago

Yes, there are definitely cases where beacons can be lost, and this only increases with late beacons. However, that must also be weighed against the cases that beaconing early reports incorrect data. The library currently chooses to choose correctness over more beaconing.

Thanks, I will maybe use reportAllChanges and implement my own timeout. I think that would work.

Do read my comments here: https://github.com/GoogleChrome/web-vitals/issues/394#issuecomment-1768184919.

Will close this for now, but let me know if you have any more questions.