Is `visibilitychange` really the right signal for when to aggregate layout shift?

Right now, the spec section 1.1 says:

A "final" score for the user’s session can be reported by listening to the visibilitychange event, and taking the value of the cumulative layout shift score at that time.

(This is in a non-normative section, but it's still worth considering this recommendation about how to use the reported metric.)

Is the visibilitychange event really what we want to suggest here? And if so, could we add a sentence or two to explain what it's expected to capture, & perhaps to consider (and hopefully address) anticipated concerns about this event?

I admit I haven't worked with visibilitychange much, so I might be missing something -- but based on its documentation and a demo, I see a few problems that seem to make it problematic for in-the-field analytics on this.

Depending on user behavior, this event...

Might fire way too often -- This event is fired whenever the user switches tabs; are we expecting that authors will really want to aggregate & submit their "final" layout-shift metrics when that happens, and repeat that aggregation every time that happens if the user repeatedly switches back and forth?
Might not fire ever [to a first approximation], or at least not soon enough -- it looks like an arbitrarily large amount of time can pass before this event fires. e.g. Suppose a user opens your page, and they leave it open in its own browser-window -- then, this event will never fire, until they quit their browser, which they might never do. If the web developer is waiting on visibilitychange to send aggregated analytics, they may ~never get their report -- or maybe-worse: when they do get their report, its value may be the summation of several days' worth of layout shift metrics, which could produce a very large and ~meaningless metric. (For a largely static page, the metric may stabilize & stop changing & perhaps could be usefully reported after an arbitrary delay; but for any site with some dynamically-updating portion, e.g. Gmail/twitter, the metric would monotonically increase over time and could be arbitrarily large when visibilitychange fires.)

Is the visibilitychange event really what we want to suggest here? And if so, could we add a sentence or two to explain what it's expected to capture, & perhaps to consider (and hopefully address) anticipated concerns about this event?

I think the recommendation is to use that event but only reporting when it changes to hidden. I don't think that results in firing way too often. It's the recommended way because it's the latest callback that reliably fires when the user leaves the page.

Might not fire ever [to a first approximation], or at least not soon enough -- it looks like an arbitrarily large amount of time can pass before this event fires. e.g. Suppose a user opens your page, and they leave it open in its own browser-window -- then, this event will never fire, until they quit their browser, which they might never do. If the web developer is waiting on visibilitychange to send aggregated analytics, they may ~never get their report -- or maybe-worse: when they do get their report, its value may be the summation of several days' worth of layout shift metrics, which could produce a very large and ~meaningless metric. (For a largely static page, the metric may stabilize & stop changing & perhaps could be usefully reported after an arbitrary delay; but for any site with some dynamically-updating portion, e.g. Gmail/twitter, the metric would monotonically increase over time and could be arbitrarily large when visibilitychange fires.)

This seems possible but extremely unlikely. Even if the user has a tab dedicated to the page (say, email), the page's process is not going to live forever. I think that the callback would be fired when the process dies. But it's certainly possible to have a timer in addition to the visibility change, so that the metric is guaranteed to be reported every X time.

@philipwalton any thoughts?

I think the recommendation is to use that event but only reporting when it changes to hidden. I don't think that results in firing way too often. It's the recommended way because it's the latest callback that reliably fires when the user leaves the page.

It fires (and reports as "hidden") when the user backgrounds the page, which could happen ~immediately and repeatedly -- that's my hypothetical "too-often" concern.

(Having said that: it seems reasonable to disregard perf measurements for backgrounded pages, because browsers may use different event-scheduling heuristics, and painting is free, etc. So from that sense, it does make sense to listen for this event and use it to reason about how you should feel about your performance metrics.)

the page's process is not going to live forever. I think that the callback would be fired when the process dies. But it's certainly possible to have a timer in addition to the visibility change, so that the metric is guaranteed to be reported every X time.

Stepping back a bit: my point here wasn't so much "the metrics might never come in", but rather "the thing that the sample code is reporting, cumulativeLayoutShiftScore, is just a sum computed over an arbitrarily small or large amount of time. And it could vary by orders of magnitude depending on whether the user switches tabs (or closes the page) immediately vs. if the user leaves the page open for 5 minutes vs. if they leave the page open until its process dies. In a static page, it probably wouldn't vary, but in a dynamic webapp like Gmail where things are appearing, it would monotonically increase over time. Given this, it feels somewhat dubious that the reported cumulativeLayoutShiftScore value would be useful in a real-life version of this usage example. (Maybe there are other implicitly-reported metrics that make it useful though?)

I feel like the usage-example would be more believable if the sample-code's updateCLS function maintained a list of the reported LS values and their timestamps, perhaps - I would think that would be much more valuable for analytics to reason about ("Between time X and time Y, we had this layout shift, and then it stabilized"), rather than "while the page was open for all time, there was $arbitrary_sum amount of layout shift.

I think the recommendation is to use that event but only reporting when it changes to hidden. I don't think that results in firing way too often. It's the recommended way because it's the latest callback that reliably fires when the user leaves the page.

It fires (and reports as "hidden") when the user backgrounds the page, which could happen ~immediately and repeatedly -- that's my hypothetical "too-often" concern.

Hmm yea, but my intuition is that this "too-often" is not a problem in practice?

Stepping back a bit: my point here wasn't so much "the metrics might never come in", but rather "the thing that the sample code is reporting, cumulativeLayoutShiftScore, is just a sum computed over an arbitrarily small or large amount of time. And it could vary by orders of magnitude depending on whether the user switches tabs (or closes the page) immediately vs. if the user leaves the page open for 5 minutes vs. if they leave the page open until its process dies. In a static page, it probably wouldn't vary, but in a dynamic webapp like Gmail where things are appearing, it would monotonically increase over time. Given this, it feels somewhat dubious that the reported cumulativeLayoutShiftScore value would be useful in a real-life version of this usage example. (Maybe there are other implicitly-reported metrics that make it useful though?)

Oh so I think you're making a point regarding normalization of the metric not being great at the moment. And I totally agree with that! The Chrome Speed Metrics team is thinking about this problem of how to improve CLS normalization so that it doesn't just penalize long-lived pages, and we're definitely open to ideas here!

I feel like the usage-example would be more believable if the sample-code's updateCLS function maintained a list of the reported LS values and their timestamps, perhaps - I would think that would be much more valuable for analytics to reason about ("Between time X and time Y, we had this layout shift, and then it stabilized"), rather than "while the page was open for all time, there was $arbitrary_sum amount of layout shift.

That's a fair point, I think we can make tweak this example to report timestamps as well, so it's clear that the 'when' is also important. Does that sound good to you @skobes?

Oh so I think you're making a point regarding normalization of the metric not being great at the moment

Yes, roughly (or that's part of it). My point is that the layout shift information seems like something that a web developer would naturally want to reason about in a more fine-grained way -- e.g.:

Hmm, our cumulative score is high and it spiked at time $Y; and in this case, it looks like that's when the $THIRD_PARTY_COMMENT_SYSTEM loaded. Let's look into that some more. ...vs:
Hmm, our cumulative score is high... but, it ticks up in a slow and steady fashion; this is consistent with [emails/tweets/what-have-you] appearing in our UI one by one over time, so probably nothing to worry about. ...vs:
Hmm, our cumulative scores are much lower today! But, they're from all from users who closed the page within 1 second, so that's not actually a good sign.

Basically: if the sample-code just sums up a score over an arbitrary & entirely-user-determined amount of time and reports that as the score, that feels like a waste (and a harder-to-imagine-as-useful usage) of a metric that's much more useful when reasoned about in a time/event-specific way.

WICG / layout-instability

Is `visibilitychange` really the right signal for when to aggregate layout shift? #50