WICG / layout-instability

A proposal for a Layout Instability specification
https://wicg.github.io/layout-instability/
Other
159 stars 27 forks source link

Is `visibilitychange` really the right signal for when to aggregate layout shift? #50

Closed dholbert closed 4 years ago

dholbert commented 4 years ago

Right now, the spec section 1.1 says:

A "final" score for the user’s session can be reported by listening to the visibilitychange event, and taking the value of the cumulative layout shift score at that time.

(This is in a non-normative section, but it's still worth considering this recommendation about how to use the reported metric.)

Is the visibilitychange event really what we want to suggest here? And if so, could we add a sentence or two to explain what it's expected to capture, & perhaps to consider (and hopefully address) anticipated concerns about this event?

I admit I haven't worked with visibilitychange much, so I might be missing something -- but based on its documentation and a demo, I see a few problems that seem to make it problematic for in-the-field analytics on this.

Depending on user behavior, this event...

npm1 commented 4 years ago

Is the visibilitychange event really what we want to suggest here? And if so, could we add a sentence or two to explain what it's expected to capture, & perhaps to consider (and hopefully address) anticipated concerns about this event?

I think the recommendation is to use that event but only reporting when it changes to hidden. I don't think that results in firing way too often. It's the recommended way because it's the latest callback that reliably fires when the user leaves the page.

  • Might not fire ever [to a first approximation], or at least not soon enough -- it looks like an arbitrarily large amount of time can pass before this event fires. e.g. Suppose a user opens your page, and they leave it open in its own browser-window -- then, this event will never fire, until they quit their browser, which they might never do. If the web developer is waiting on visibilitychange to send aggregated analytics, they may ~never get their report -- or maybe-worse: when they do get their report, its value may be the summation of several days' worth of layout shift metrics, which could produce a very large and ~meaningless metric. (For a largely static page, the metric may stabilize & stop changing & perhaps could be usefully reported after an arbitrary delay; but for any site with some dynamically-updating portion, e.g. Gmail/twitter, the metric would monotonically increase over time and could be arbitrarily large when visibilitychange fires.)

This seems possible but extremely unlikely. Even if the user has a tab dedicated to the page (say, email), the page's process is not going to live forever. I think that the callback would be fired when the process dies. But it's certainly possible to have a timer in addition to the visibility change, so that the metric is guaranteed to be reported every X time.

@philipwalton any thoughts?

dholbert commented 4 years ago

I think the recommendation is to use that event but only reporting when it changes to hidden. I don't think that results in firing way too often. It's the recommended way because it's the latest callback that reliably fires when the user leaves the page.

It fires (and reports as "hidden") when the user backgrounds the page, which could happen ~immediately and repeatedly -- that's my hypothetical "too-often" concern.

(Having said that: it seems reasonable to disregard perf measurements for backgrounded pages, because browsers may use different event-scheduling heuristics, and painting is free, etc. So from that sense, it does make sense to listen for this event and use it to reason about how you should feel about your performance metrics.)

the page's process is not going to live forever. I think that the callback would be fired when the process dies. But it's certainly possible to have a timer in addition to the visibility change, so that the metric is guaranteed to be reported every X time.

Stepping back a bit: my point here wasn't so much "the metrics might never come in", but rather "the thing that the sample code is reporting, cumulativeLayoutShiftScore, is just a sum computed over an arbitrarily small or large amount of time. And it could vary by orders of magnitude depending on whether the user switches tabs (or closes the page) immediately vs. if the user leaves the page open for 5 minutes vs. if they leave the page open until its process dies. In a static page, it probably wouldn't vary, but in a dynamic webapp like Gmail where things are appearing, it would monotonically increase over time. Given this, it feels somewhat dubious that the reported cumulativeLayoutShiftScore value would be useful in a real-life version of this usage example. (Maybe there are other implicitly-reported metrics that make it useful though?)

I feel like the usage-example would be more believable if the sample-code's updateCLS function maintained a list of the reported LS values and their timestamps, perhaps - I would think that would be much more valuable for analytics to reason about ("Between time X and time Y, we had this layout shift, and then it stabilized"), rather than "while the page was open for all time, there was $arbitrary_sum amount of layout shift.

npm1 commented 4 years ago

I think the recommendation is to use that event but only reporting when it changes to hidden. I don't think that results in firing way too often. It's the recommended way because it's the latest callback that reliably fires when the user leaves the page.

It fires (and reports as "hidden") when the user backgrounds the page, which could happen ~immediately and repeatedly -- that's my hypothetical "too-often" concern.

Hmm yea, but my intuition is that this "too-often" is not a problem in practice?

Stepping back a bit: my point here wasn't so much "the metrics might never come in", but rather "the thing that the sample code is reporting, cumulativeLayoutShiftScore, is just a sum computed over an arbitrarily small or large amount of time. And it could vary by orders of magnitude depending on whether the user switches tabs (or closes the page) immediately vs. if the user leaves the page open for 5 minutes vs. if they leave the page open until its process dies. In a static page, it probably wouldn't vary, but in a dynamic webapp like Gmail where things are appearing, it would monotonically increase over time. Given this, it feels somewhat dubious that the reported cumulativeLayoutShiftScore value would be useful in a real-life version of this usage example. (Maybe there are other implicitly-reported metrics that make it useful though?)

Oh so I think you're making a point regarding normalization of the metric not being great at the moment. And I totally agree with that! The Chrome Speed Metrics team is thinking about this problem of how to improve CLS normalization so that it doesn't just penalize long-lived pages, and we're definitely open to ideas here!

I feel like the usage-example would be more believable if the sample-code's updateCLS function maintained a list of the reported LS values and their timestamps, perhaps - I would think that would be much more valuable for analytics to reason about ("Between time X and time Y, we had this layout shift, and then it stabilized"), rather than "while the page was open for all time, there was $arbitrary_sum amount of layout shift.

That's a fair point, I think we can make tweak this example to report timestamps as well, so it's clear that the 'when' is also important. Does that sound good to you @skobes?

dholbert commented 4 years ago

Oh so I think you're making a point regarding normalization of the metric not being great at the moment

Yes, roughly (or that's part of it). My point is that the layout shift information seems like something that a web developer would naturally want to reason about in a more fine-grained way -- e.g.:

Basically: if the sample-code just sums up a score over an arbitrary & entirely-user-determined amount of time and reports that as the score, that feels like a waste (and a harder-to-imagine-as-useful usage) of a metric that's much more useful when reasoned about in a time/event-specific way.