CSS Rich Browsing Proposal

HongZheng commented 1 year ago

Hi everyone, We are from Intel and want to introduce a CSS effects/animations rich workload into Speedoemeter3. Design doc is at https://docs.google.com/document/d/19vK5G11Kc4xbvhpkkXDf5WXdFQWyiK_a0WRwTUcc9j4/ I copied the contents of the document here in case you don't have access to it.

Objective
The objective of this proposal is to introduce a CSS effects/animations rich test case that can help browsers measure and improve CSS performance.
Motivation
CSS is an essential component of modern web development that helps developers build a large number of appealing and engaging websites. When CSS enables fancy webpages, the performance of CSS becomes an important factor affecting user experience on the web. Some web benchmarks normally measure CSS/DOM operations and JS tasks together, making it hard to check CSS performance/impact alone. Therefore, we propose adding a CSS heavy test case into Speedometer3 to help browsers measure and improve CSS performance.
Description
By learning from some real-life scenarios, such as image switching and table updating in https://top10.netflix.com/tv and https://www.imdb.com/, this proposal simulates a food menu with 5 kinds of food. Each food category contains 100 choices, and the first one is recommended. The proposal can automatically switch the 5 kinds of food one by one through clicking food pictures on the top of the page by JS. In the real world, web developers generally use CSS effects/animations to make web pages more appealing and engaging to end users. The proposal also exercises many CSS property operations (referencing the statistics from https://chromestatus.com/), such as transform animation, opacity/color setting, position/size adjustment etc. The web page of this proposal looks like the image below.

Measurement methodology
For performance measurement, the proposal utilizes performance.mark API to mark CSS animation frame start in requestAnimationFrame callback, and mark frame completion in the callback of afterframe API (https://www.npmjs.com/package/afterframe). The image below shows how to measure frame duration in Chrome trace. The final time reported is the average duration to render a frame in milliseconds.

Code can be reviewed at https://github.com/intel-staging/Speedometer/tree/rich_css/resources/tentative/rich-css
Note: The workload can be run by launching index.html from the rich-css folder. It is not integrated into the benchmark runner yet.
A live demo is available at https://hongzheng.github.io/

Results
Run the proposal 10 rounds in Chrome/Firefox/Safari on M2, choose the median of the average frame time as final result and calculate variance using Coefficient of variation (Standard deviation divided by the mean)

M2 (MacOS Ventura 13.3.1) | Time(ms) | CV (Coefficient of variation) -- | -- | -- Chrome (112.0.5615.137) | 11.7845 | 2.25% Firefox (112.0.2) | 6.709 | 2.05% Safari (16.4) | 21.072 | 7.63%

camillobruni commented 1 year ago

Are you measuring Score or Time? (Your table header might be a bit confusing :))
I tend towards excluding CSS animation from speedometer, as we might introduce frame-rate-based measurements here
Other CSS and composting properties do seem fine with me

mram0509 commented 1 year ago

Yes- Time and not Score ( will edit)
This methodology aims to avoid a frame-rate based measurement while reflecting the performance of heavy CSS operations seen in common real world use cases. It could be a good addition to Speedometer 3, which aims to reflect real-world user experiences. It stays away from Pure CSS animation, which would be hard to measure in a real world scenario.

bgrins commented 1 year ago

the proposal utilizes performance.mark API to mark CSS animation frame start in requestAnimationFrame callback, and mark frame completion in the callback of afterframe API (https://www.npmjs.com/package/afterframe).

I'm not familiar with that library, but it looks like it's essentially stopping the timer after the next rAF is fired following the rAF callback which starts the timer?

This isn't currently possible within the Speedometer framework, since we don't have the ability to perform async steps (and most likely won't within the version 3 timeframe, though we are definitely interested in developing this ability so we can test things like Workers). So if this is a requirement for the test we should look at this test as a potential addition for a future version. Though I wonder if it's possible to build a test with this content using a similar pattern to the NewsSite in #167 - for example having content get appended or toggling classes etc in a sync step.

rniwa commented 1 year ago

Yeah, we don't support this kind of async workload at the moment. Deferring this to v4 seems like the right course of action here.

HongZheng commented 1 year ago

Thanks all for your great comments!

Yes, the measurement of whole set of CSS animation frames depends on the async step in SP driver, which seems won’t be ready until SP4. We are updating the workload to adopt the measurement similar as NewsSite case mentioned by Brian. This is quite good because initial frame is measured, in particular if with rAF based async measurement (#173).

The workload thus reflects web runtime’s responsiveness to user actions, when facing CSS heavy and animation rich scenarios in real world like Netflix and Amazon etc. The animations we use are based on statistics from chromestatus, which mentioned that animations are utilized by more than 40% webpages.

Again, thanks for the comments and we’ll soon have a new update and welcome your further insights to it!

HongZheng commented 1 year ago

We have integrated CSS rich workload into SP3. You can review the code at https://github.com/intel-staging/Speedometer/tree/rich_css/resources/tentative/rich-css A live demo at https://hongzheng.github.io/sp3-rich_css/?suite=Rich-CSS#home

mram0509 commented 1 year ago

From discussion during meeting on 06/07/2023 - Please review the workload above. Could we do a PR and add the test to tentative? And labeled accordingly? Currently its labeled as V4.

mram0509 commented 1 year ago

Discussion and comments from the slack Channel

smfr 5 days ago this seems more like a painting benchmark, with things like CSS filters

smfr 5 days ago is it intended to stress CSS parsing etc, or painting, or both?

bgrins 1 day ago I do like the idea in general of a rich data table / image carousel etc that rely on complex CSS as being part of Speedometer. But in practice I'd prefer the test to be more heavily inspired by real world content & patterns so we can be confident that optimizations to it will drive increased performance for content on the Web. A couple things for example in the test that I don't think I've seen on pages are having a table where the focused row is sharp and the others have a blur effect, and big multistop CSS gradient background images on the body element. One thing that I think would help me understand how to review this is to reposition this test from "Rich CSS" (the tech that's being tested) to something more descriptive of the experience that's being modeled. Is it meant to be a "Interacting wtih a dashboard to find popular content" (a la Netflix / IMDB), "Searching for a restaurant" (a la Yelp or a Maps app), or something else? It may seem silly, but since there's so much open space on implementation details I've found doing this helps set some constraints around what content to study and model. See also my proposal around workload definition https://docs.google.com/document/d/1BCAlKWqILFtoqH6wLRQc1RtFukY60nuEotjvEQZANgg/edit#heading=h.i542mtoho4us - and I know we've done a bit of documentation on these types of tests in https://docs.google.com/document/d/155PztxZ-I-Epk_Fm_l7FmCerhKFwyoGnYpOhMsdVdY8/edit#heading=h.2uml6sq22frj but I think it could be a bit more clear for this case.

bgrins 1 day ago FWIW I can see the inspiration from https://top10.netflix.com/tv and think something of this shape could be a good test. I haven't studied this closely, but clicking around there the most interesting work I see is driven from the UI change from i.e. TV to Films and not a linkage between the table and the carousel. There are even more permutations with "type" and "country" dropdowns on https://www.netflix.com/tudum/top10/united-states - when changing these params it drives a change to the images in the carousel, the contents of the table, and a complex "card" for each show beneath the table. Not sure if it's a similar story for IMDB which is also referenced in the issue

mram0509 commented 1 year ago

[From Hong Sheng] Thanks for the great and valuable comments! Yes, it’s better to be changed to something like “Interacting with Featured Page to Navigate or Search Items” as suggested. We’re also considering to remove CSS gradient and blur from the implementation as well and replace with something more typical in real world. The proposed case stresses CSS processing, DOM and a little bit of painting.

Regarding the real-world reflection, we’re inspired by a list of websites that put an image based carousel/slideshow along with list/table of contents as the UI pattern to organize and present their information. Some of the them are listed below.

Real World Scenario	URL	Typical UI Elements	Typical Interactions	Typical Web tech exercised
Netflix Top 10 TV	https://top10.netflix.com/tv	Carousel & Table of Popular contents	Change selection in Dropdown triggers the update of carousel images and contents in table	CSS transform, opacity/size settings
IMDB Index	https://www.imdb.com/	Carousel & List of contents in DIV blocks	Click the arrow button in carousel to update the list of the contents	CSS transform, opacity/color settings
Facebook photo	https://www.facebook.com/photo/?fbid=742932955788401&set=a.636245978533785	Slideshow & List of contents in DIV blocks	Click the arrow button in carousel to update the list of the contents	CSS opacity/color/padding settings
Walmart Complete the look	https://www.walmart.com/ip/The-Beatles-Men-s-Abbey-Road-Graphic-T-Shirt-with-Short-Sleeves/1786976753?athbdg=L1600	Carousel & List of contents in DIV blocks	Click the arrow button in carousel to update the list of the contents in “View details” window	CSS fade-out animation, opacity/visibility settings

HongZheng commented 1 year ago

I have updated the code according to the comments, please review code at https://github.com/intel-staging/Speedometer/tree/rich_css/resources/tentative/search-in-featured-page A live demo at https://hongzheng.github.io/sp3-rich_css/?suite=Search-In-Featured-Page#home

rniwa commented 1 year ago

Is there a PR to add this workload somewhere?

HongZheng commented 1 year ago

Is there a PR to add this workload somewhere?

I have submitted a PR #247, please review.

camillobruni commented 1 year ago

I like the CSS heavy part of this workload, but I still have some doubts:

A lot of time is spent in just building up a table (~ 25% of the time), maybe that's something to optimise
Having animations in this workload seems a bit counter intuitive. In order to boost the score, the most straight-forward (and wrong) thing to do, would be to not do any animation (in the extreme case) to free up resources for the main-thread JS / DOM part.
The CSS is rather small for a typical website (maybe you had plans to extend this?)

Maybe we can keep the CSS rules with animations in there, but don't trigger them.

HongZheng commented 1 year ago

As I said in PR #247, the generated table contains some heavy CSS effects and animations, which reflects the purpose of the workload. But current solution may introduce much HTML parsing overhead which we will optimise later. As async step won’t be ready until SP4, the workload currently only measures the initial frame after the table is generated. So animations don't happen in the scoring region. The result is something like "keep the CSS rules with animations in there, but don't trigger them".

rniwa commented 1 year ago

I'm at a bit of loss as to what state this proposal is in. What is the proposed PR for Speedometer 3? Is it https://github.com/WebKit/Speedometer/pull/247 ?

If so, that workload seems to induce a bunch of stray async tasks that run outside of the measured time window both on Chrome & Safari so we should fix that.

More generally, while I appreciate & value your contributions, this workload seems to be more about measuring the page load speed than web app responsiveness. Since the goal of Speedometer is measuring web app responsiveness, not page load speed, this test might be out of scope by concept. If we were to include this test in Speedometer 3, we need to put more focus on app responsiveness after the page load had completed.

camillobruni commented 1 year ago

I share rniwa's sentinment here. Maybe let's not rush this and put investigation / development on pause for this workload (and maybe think about it again for the next version 4) until we have stabilised everything else.

mram0509 commented 1 year ago

Thanks for your comments. rniwa - regarding your comment "measuring the page load speed than web app responsiveness" - The measured part of the workload occurs after the page is loaded. So it does measure page responsiveness, by measuring the time it takes to create the initial frame and not the page load speed.

rniwa commented 1 year ago

Thanks for your comments. rniwa - regarding your comment "measuring the page load speed than web app responsiveness" - The measured part of the workload occurs after the page is loaded. So it does measure page responsiveness, by measuring the time it takes to create the initial frame and not the page load speed.

Measuring the initial frame still sounds like a page load test to me. I think we should not measure the initial frame / page load at all for this workload to be stay focused.

WebKit / Speedometer

CSS Rich Browsing Proposal #175

Objective

Motivation

Description

Measurement methodology

Results