GoogleChrome / web-vitals

Essential metrics for a healthy site.
https://web.dev/vitals
Apache License 2.0
7.55k stars 414 forks source link

v4 INP attribution ending processingEnd time in the wrong animation frame #462

Open mmocny opened 6 months ago

mmocny commented 6 months ago

Screenshot: Screenshot 2024-04-10 at 11 34 28

(notice that processing time ends after the paint/commit on main thread).

Note: I was using the experimental PR of Web Vitals Extension to test web-vitals.js v4 beta, but I didn't confirm that it has the latest and greatest version.


Example trace: https://trace.cafe/t/MUVKayAJwm

I think what miught be happening is that we are using the event renderTime as the end point and finding the last LoAF in that range, rather than using the event processing times? I'm not sure.

In order to reproduce here are the steps:

Snippet:

const EVENTS = [
  "keydown",
  "keyup",
  "keypress",
//   "pointerdown",
//   "pointerup",
  "pointercancel",
  "touchstart",
  "touchend",
  "touchcancel",
//   "mousedown",
//   "mouseup",
  "gotpointercapture",
  "lostpointercapture",
//   "click",
  "dblclick",
  "auxclick",
  "contextmenu",
  "pointerleave",
  "pointerout",
  "pointerover",
  "pointerenter",
  "mouseout",
  "mouseover",
  "mouseleave",
  "mouseenter",
  "lostpointercapture",
  "dragstart",
  "dragend",
  "dragenter",
  "dragleave",
  "dragover",
  "drop",
  "beforeinput",
  "input",
  "compositionstart",
  "compositionupdate",
  "compositionend",
];

function block(ms) {
    const target = performance.now() + ms;
    while(performance.now() < target);
}

EVENTS.forEach(type => {
    document.addEventListener(type, e => {
        console.log(`Event type: ${type}`);
        block(20);
    });
});
philipwalton commented 6 months ago

(notice that processing time ends after the paint/commit on main thread).

I see that, but I also see that DevTools shows the interaction as ending about the same time as the extension says it ends (both of them after paint/commit on the main thread). This makes me think it's not a bug in the web-vitals library but instead a bug somewhere else?

I also see that processingDuration ends at the same time as that mouseover event ends; is it possible that no frame was actually presented after the paint/commit and so this actually was the end of processing?

Note: I was using the experimental PR of Web Vitals Extension to test web-vitals.js v4 beta, but I didn't confirm that it has the latest and greatest version ... I think what miught be happening is that we are using the event renderTime as the end point and finding the last LoAF in that range, rather than using the event processing times? I'm not sure.

The initial PR to add breakdown timings did use renderTime from LoAF, but that was removed. All of the released versions (beta 0, 1, and 2) only use processing start/end from event timing, so unless you were using a version that included code from #422 before it was merged, I think this is unlikely.

Maybe @rviscomi or @tunetheweb could help clarify whether or not any experimental extension versions were release that contained code from #442 before it was merged?

tunetheweb commented 6 months ago

I’m keeping my PR updated with the latest v4 branch as we merge stuff. Well actually I’ve gone ahead of that now since I have the redirect stuff in there too at the mo.

mmocny commented 6 months ago

(I was using @tunetheweb PR and can try to confirm the version, but I trust that its the latest.)

I also see that DevTools shows the interaction as ending about the same time

That rightmost whisker is the "pure presentation delay", off-main thread cost of rasterizing this frame. I do admit that 80ms+ is very atypical for a Mac desktop w/o throttling, but I think this is an example where a hugely complex DOM is causing chrome to go sluggish, which you can see in Raster thread and GPU breakdowns, starting immediately after Commit:

Screenshot 2024-04-11 at 09 39 30


I also took a look at this trace in Perfetto and can confirm its that first frame.

Not that it matters for web-vitals.js , but to make things more interesting for this example, only about 18ms of the presentation delay is between Commit and Activate, most of the presentation delay comes after Activation. This means that not only is the next paint of content delayed, but all compositor driven animations would be stalled here.

However, a reasonable portion of this is caused by DevTools performance profiling (screenshot capture) itself-- which I mention as a warning/reminder that you should disable screenshots if you find presentation delays are very large.

I tested the site again, without tracing enabled at all, and still found that scrolling becomes super janky when that details field is being expanded (and still somewhat janky after its expanded).

mmocny commented 6 months ago

Oh, and one more difference with DevTools: the processing end time in DevTools ends with the last event that has an interactionID. That is not what web-vitals.js is doing and I think we should discuss updating DevTools to match, rather than trying to match DevTools.

mmocny commented 6 months ago

Hmm-- in that specific trace I uploaded originally, the second animation frame ends at almost exactly the same time as the first animation frame (only 1ms apart). That is because there is actually no paint update needed and it ends of processingEnd time of the last event handler.

This is pure coincidence-- and I re-recorded another trace where the delta is different-- but I wonder if this could be the cause of the error?

The event timings would have a renderTime +/- 8ms of each other even through they are in different animation frames and we would group them together, possibly?


If we can validate that is it, I think there is a possible remedy:

This also seems like a good reason to change the Event Timing spec to make it explicit that there is no "real renderTime" for some events.

tunetheweb commented 6 months ago

We definitely do use an 8ms threshold:

https://github.com/GoogleChrome/web-vitals/blob/5420b5f4b28ad9296ca64e53858a8f93a74f7443/src/attribution/onINP.ts#L98-L102

We do use the LoAF end time to clap the processing time:

https://github.com/GoogleChrome/web-vitals/blob/5420b5f4b28ad9296ca64e53858a8f93a74f7443/src/attribution/onINP.ts#L217-L225

But given Event timing is only accurate to 8ms, can we assume they will perfectly match up to LoAF entries to allow that removal you suggest?

Also if both animation frames end within 1ms of each other then does it really matter in that case that we are reporting the second? Even a max of 8ms seems acceptable for those rare times?

mmocny commented 6 months ago

processingStart and processingEnd are not clamped, and those will perfectly overlap with LoAF (Caveat: LoAF duration may be rounded to the nearest whole number, but not the 8ms thing)

It's just duration that does that. The problem is that from the Event Timing data itself you cannot really tell if two event timings are in the same animation frame or not, except by convention of their renderTimes being "close together" and this is an example where that falls apart.

In cases where you have LoAF, it makes everything easier as you can just use LoAF time range for grouping events by animation frame. But since you don't always have LoAF I think web-vitals.js tried to cluster by event-timing data only and then use loaf only for attribution?

tunetheweb commented 6 months ago

But since you don't always have LoAF I think web-vitals.js tried to cluster by event-timing data only and then use loaf only for attribution?

Correct. Currently we just use LoAF for:

  1. Set nextPaintTime (the clamping above)
  2. Provide the LoAF entries themselves to save the developer having to use whyNP-style script to match them up.