WebKit / Speedometer

An open source repository for the Speedometer benchmark
Other
616 stars 75 forks source link

CSS Rich Browsing Proposal #175

Open HongZheng opened 1 year ago

HongZheng commented 1 year ago

Hi everyone, We are from Intel and want to introduce a CSS effects/animations rich workload into Speedoemeter3. Design doc is at https://docs.google.com/document/d/19vK5G11Kc4xbvhpkkXDf5WXdFQWyiK_a0WRwTUcc9j4/ I copied the contents of the document here in case you don't have access to it.

Objective

The objective of this proposal is to introduce a CSS effects/animations rich test case that can help browsers measure and improve CSS performance.

Motivation

CSS is an essential component of modern web development that helps developers build a large number of appealing and engaging websites. When CSS enables fancy webpages, the performance of CSS becomes an important factor affecting user experience on the web. Some web benchmarks normally measure CSS/DOM operations and JS tasks together, making it hard to check CSS performance/impact alone. Therefore, we propose adding a CSS heavy test case into Speedometer3 to help browsers measure and improve CSS performance.

Description

By learning from some real-life scenarios, such as image switching and table updating in https://top10.netflix.com/tv and https://www.imdb.com/, this proposal simulates a food menu with 5 kinds of food. Each food category contains 100 choices, and the first one is recommended. The proposal can automatically switch the 5 kinds of food one by one through clicking food pictures on the top of the page by JS. In the real world, web developers generally use CSS effects/animations to make web pages more appealing and engaging to end users. The proposal also exercises many CSS property operations (referencing the statistics from https://chromestatus.com/), such as transform animation, opacity/color setting, position/size adjustment etc. The web page of this proposal looks like the image below.

 

Measurement methodology

For performance measurement, the proposal utilizes performance.mark API to mark CSS animation frame start in requestAnimationFrame callback, and mark frame completion in the callback of afterframe API (https://www.npmjs.com/package/afterframe). The image below shows how to measure frame duration in Chrome trace. The final time reported is the average duration to render a frame in milliseconds.


Code can be reviewed at https://github.com/intel-staging/Speedometer/tree/rich_css/resources/tentative/rich-css

Note: The workload can be run by launching index.html from the rich-css folder. It is not integrated into the benchmark runner yet. 

A live demo is available at https://hongzheng.github.io/


Results

Run the proposal 10 rounds in Chrome/Firefox/Safari on M2, choose the median of the average frame time as final result and calculate variance using Coefficient of variation (Standard deviation divided by the mean) 


M2 (MacOS Ventura 13.3.1) | Time(ms) | CV (Coefficient of variation) -- | -- | -- Chrome (112.0.5615.137) | 11.7845 | 2.25% Firefox (112.0.2) | 6.709 | 2.05% Safari (16.4) | 21.072 | 7.63%
camillobruni commented 1 year ago
mram0509 commented 1 year ago
bgrins commented 1 year ago

the proposal utilizes performance.mark API to mark CSS animation frame start in requestAnimationFrame callback, and mark frame completion in the callback of afterframe API (https://www.npmjs.com/package/afterframe).

I'm not familiar with that library, but it looks like it's essentially stopping the timer after the next rAF is fired following the rAF callback which starts the timer?

This isn't currently possible within the Speedometer framework, since we don't have the ability to perform async steps (and most likely won't within the version 3 timeframe, though we are definitely interested in developing this ability so we can test things like Workers). So if this is a requirement for the test we should look at this test as a potential addition for a future version. Though I wonder if it's possible to build a test with this content using a similar pattern to the NewsSite in #167 - for example having content get appended or toggling classes etc in a sync step.

rniwa commented 1 year ago

Yeah, we don't support this kind of async workload at the moment. Deferring this to v4 seems like the right course of action here.

HongZheng commented 1 year ago

Thanks all for your great comments!

Yes, the measurement of whole set of CSS animation frames depends on the async step in SP driver, which seems won’t be ready until SP4. We are updating the workload to adopt the measurement similar as NewsSite case mentioned by Brian. This is quite good because initial frame is measured, in particular if with rAF based async measurement (#173).

The workload thus reflects web runtime’s responsiveness to user actions, when facing CSS heavy and animation rich scenarios in real world like Netflix and Amazon etc. The animations we use are based on statistics from chromestatus, which mentioned that animations are utilized by more than 40% webpages.

Again, thanks for the comments and we’ll soon have a new update and welcome your further insights to it!

HongZheng commented 1 year ago

We have integrated CSS rich workload into SP3. You can review the code at https://github.com/intel-staging/Speedometer/tree/rich_css/resources/tentative/rich-css A live demo at https://hongzheng.github.io/sp3-rich_css/?suite=Rich-CSS#home

mram0509 commented 1 year ago

From discussion during meeting on 06/07/2023 - Please review the workload above. Could we do a PR and add the test to tentative? And labeled accordingly? Currently its labeled as V4.

mram0509 commented 1 year ago

Discussion and comments from the slack Channel

smfr 5 days ago this seems more like a painting benchmark, with things like CSS filters

smfr 5 days ago is it intended to stress CSS parsing etc, or painting, or both?

bgrins 1 day ago I do like the idea in general of a rich data table / image carousel etc that rely on complex CSS as being part of Speedometer. But in practice I'd prefer the test to be more heavily inspired by real world content & patterns so we can be confident that optimizations to it will drive increased performance for content on the Web. A couple things for example in the test that I don't think I've seen on pages are having a table where the focused row is sharp and the others have a blur effect, and big multistop CSS gradient background images on the body element. One thing that I think would help me understand how to review this is to reposition this test from "Rich CSS" (the tech that's being tested) to something more descriptive of the experience that's being modeled. Is it meant to be a "Interacting wtih a dashboard to find popular content" (a la Netflix / IMDB), "Searching for a restaurant" (a la Yelp or a Maps app), or something else? It may seem silly, but since there's so much open space on implementation details I've found doing this helps set some constraints around what content to study and model. See also my proposal around workload definition https://docs.google.com/document/d/1BCAlKWqILFtoqH6wLRQc1RtFukY60nuEotjvEQZANgg/edit#heading=h.i542mtoho4us - and I know we've done a bit of documentation on these types of tests in https://docs.google.com/document/d/155PztxZ-I-Epk_Fm_l7FmCerhKFwyoGnYpOhMsdVdY8/edit#heading=h.2uml6sq22frj but I think it could be a bit more clear for this case.

bgrins 1 day ago FWIW I can see the inspiration from https://top10.netflix.com/tv and think something of this shape could be a good test. I haven't studied this closely, but clicking around there the most interesting work I see is driven from the UI change from i.e. TV to Films and not a linkage between the table and the carousel. There are even more permutations with "type" and "country" dropdowns on https://www.netflix.com/tudum/top10/united-states - when changing these params it drives a change to the images in the carousel, the contents of the table, and a complex "card" for each show beneath the table. Not sure if it's a similar story for IMDB which is also referenced in the issue

mram0509 commented 1 year ago

[From Hong Sheng] Thanks for the great and valuable comments! Yes, it’s better to be changed to something like “Interacting with Featured Page to Navigate or Search Items” as suggested. We’re also considering to remove CSS gradient and blur from the implementation as well and replace with something more typical in real world. The proposed case stresses CSS processing, DOM and a little bit of painting.

Regarding the real-world reflection, we’re inspired by a list of websites that put an image based carousel/slideshow along with list/table of contents as the UI pattern to organize and present their information. Some of the them are listed below.

Real World Scenario URL Typical UI Elements Typical Interactions Typical Web tech exercised
Netflix Top 10 TV https://top10.netflix.com/tv Carousel & Table of Popular contents Change selection in Dropdown triggers the update of carousel images and contents in table CSS transform, opacity/size settings
IMDB Index https://www.imdb.com/ Carousel & List of contents in DIV blocks Click the arrow button in carousel to update the list of the contents CSS transform, opacity/color settings
Facebook photo https://www.facebook.com/photo/?fbid=742932955788401&set=a.636245978533785 Slideshow & List of contents in DIV blocks Click the arrow button in carousel to update the list of the contents CSS opacity/color/padding settings
Walmart Complete the look https://www.walmart.com/ip/The-Beatles-Men-s-Abbey-Road-Graphic-T-Shirt-with-Short-Sleeves/1786976753?athbdg=L1600 Carousel & List of contents in DIV blocks Click the arrow button in carousel to update the list of the contents in “View details” window CSS fade-out animation, opacity/visibility settings
HongZheng commented 1 year ago

I have updated the code according to the comments, please review code at https://github.com/intel-staging/Speedometer/tree/rich_css/resources/tentative/search-in-featured-page A live demo at https://hongzheng.github.io/sp3-rich_css/?suite=Search-In-Featured-Page#home

rniwa commented 1 year ago

Is there a PR to add this workload somewhere?

HongZheng commented 1 year ago

Is there a PR to add this workload somewhere?

I have submitted a PR #247, please review.

camillobruni commented 1 year ago

I like the CSS heavy part of this workload, but I still have some doubts:

Maybe we can keep the CSS rules with animations in there, but don't trigger them.

HongZheng commented 1 year ago

As I said in PR #247, the generated table contains some heavy CSS effects and animations, which reflects the purpose of the workload. But current solution may introduce much HTML parsing overhead which we will optimise later. As async step won’t be ready until SP4, the workload currently only measures the initial frame after the table is generated. So animations don't happen in the scoring region. The result is something like "keep the CSS rules with animations in there, but don't trigger them".

rniwa commented 1 year ago

I'm at a bit of loss as to what state this proposal is in. What is the proposed PR for Speedometer 3? Is it https://github.com/WebKit/Speedometer/pull/247 ?

If so, that workload seems to induce a bunch of stray async tasks that run outside of the measured time window both on Chrome & Safari so we should fix that.

More generally, while I appreciate & value your contributions, this workload seems to be more about measuring the page load speed than web app responsiveness. Since the goal of Speedometer is measuring web app responsiveness, not page load speed, this test might be out of scope by concept. If we were to include this test in Speedometer 3, we need to put more focus on app responsiveness after the page load had completed.

camillobruni commented 1 year ago

I share rniwa's sentinment here. Maybe let's not rush this and put investigation / development on pause for this workload (and maybe think about it again for the next version 4) until we have stabilised everything else.

mram0509 commented 1 year ago

Thanks for your comments. rniwa - regarding your comment "measuring the page load speed than web app responsiveness" - The measured part of the workload occurs after the page is loaded. So it does measure page responsiveness, by measuring the time it takes to create the initial frame and not the page load speed.

rniwa commented 1 year ago

Thanks for your comments. rniwa - regarding your comment "measuring the page load speed than web app responsiveness" - The measured part of the workload occurs after the page is loaded. So it does measure page responsiveness, by measuring the time it takes to create the initial frame and not the page load speed.

Measuring the initial frame still sounds like a page load test to me. I think we should not measure the initial frame / page load at all for this workload to be stay focused.