New perf metric: Web bloat score

paulirish commented 5 years ago

Description of audit and audit category

Basically, it's the (total bytes of the webpage) / (bytes of a screenshot of that webpage). It's defined at https://www.webbloatscore.com/.

Explanation of how it’s different from other audits

No other metrics attempt to judge the efficiency of the bytes used.

What % of developers/pages will this impact (estimates OK, data points preferred)

It's a perf metric.. so nearly 100%.

How would the audit appear in the report?

Another perf metric. Maybe like this?

How is the new audit making a better web for end users? (data points preferred)

The weight of a screenshot of the page isn't typically considered, but serves as a nice reference point that everyone can appreciate. This metric is a lot easier to explain than our other perf metrics. And it addresses a unique measurement that isn't covered by any existing metrics.

What is the resourcing situation (who will create the audits, maintain the audits, and write/maintain the documentation)

We're interested in someone contributing this audit. We can help guide them. Once implemented, Lighthouse core team will maintain.

Do you envision this audit in the Lighthouse report or the full config in the CLI? If in the report, which section?

🤔 This is a weird question, Lighthouse.

How much support is needed from the Lighthouse team?

A bushelful.

Any other links or documentation that we should check out?

Nah just https://www.webbloatscore.com/

connorjclark commented 5 years ago

I think this is an interesting metric. Mostly, I like how simple it is. I thought of some cases where the score could be misleading, but after some experimentation it's probably not an issue. I'll share anyhow.

One, since PNG (the format suggested in the bloat score website) is a lossless format, a page containing a photo album of JPEGs may produce a higher bloat score than a photo album of PNGs, even though JPEGs would be the better choice to minimize network bandwidth.

Two, is it possible to inflate the size of a PNG through clever visual artifacts? If so, this metric could be gamed.

For one, I grabbed a highres JPEG, created a simple document, and took a snapshot with puppeteer + tracked network usage to calculate the bloat score. I also converted (without optimization) the JPEG to PNG and calculated the bloat. The size of the snapshot of the PNG was barely larger than the snapshot of the JPEG, which I didn't expect, so this could be a non-issue. Code/output is here. Or perhaps my methodology is wrong.

I don't seriously think two would happen, but it would be neat to see.

benschwarz commented 5 years ago

This would likely need to use the screenshot api (eg, not use the trace screenshots) so that the image is full size. The advantage over using that API would be that the image is already PNG, and a compression setting can be added to help simplify the audit.

Could we alter the final-screenshot audit to also use this full size image?

patrickhulce commented 5 years ago

Awesome! I have a few thoughts here :)

❤️ the idea of hammering home the need to drive bytes down
❤️ some sort of visual information to downloaded size ratio
❤️ all these answers to the audit checklist 😄

Now not to 🌧on the web bloat score parade buuuuuttt...

I'm not sure I 100% agree with presenting it as a performance metric. If including additional bytes does not show up in our metrics in any way, I'm fairly confident saying it's not really an issue with the site's performance. It's more of a user-hostile issue for consuming the user's bandwidth unnecessarily. My test for this is that with all the other metrics (with some exceptions to TTI I've already lost 😅 ), improvement there directly and obviously improves the user's perception of the site's performance. If after TTI I just download a bunch of useless stuff to inflate the bytes, my perception of the site's performance doesn't really change (unless it starts downloading a lot more later causing contention, but we ignore what the page might do later throughout LH). The pain is really the cost to my data plan. Would folks be willing to consider this as a diagnostic audit to replace/supplement total-byte-weight or a numeric best practices audit or some new-fangled category? I recognize it'll be given less consideration if it's not scored so I'm trying to find a way to make it work promise :D
I don't think we should implement it as directly stated on webbloatscore. Specifically, PNG :) From their docs...

It was our arbitrary decision to use PNG, as they are lossless compression. Maybe JPEGs would be better, most web pages have lossy JPEGs on them anyway.
I do argue that JPEGs would be better and we already have the JPEG size from the trace so it'd be free. If we really want to follow in the spirit of PNG, we could re-use the histogram color counts from speedIndex instead which is very roughly how PNG compression works. I'm not convinced it'd be critical enough to grab full size PNG screenshots separately for this.
I have some issues with web bloat score totally ignoring interactivity/animation/etc as a priority of the modern web, but the bloat is so bad today I don't think many good actors are going to get unfairly punished, so I'll stop here :)

benschwarz commented 5 years ago

I'm not convinced it'd be critical enough to grab full size PNG screenshots separately for this.

Isn't part of the metric that a picture of a website is x amount smaller than the actual website?

I'm not sure I 100% agree with presenting it as a performance metric.

I'm not either tbh. It's a bit cute, but not meaningful.

patrickhulce commented 5 years ago

Isn't part of the metric that a picture of a website is x amount smaller than the actual website?

Yes, but there's all kinds of non-sense that goes along with this sensitivity to viewport size. If I change my resolution to something 4x larger but I've already hit the highest image breakpoints (if there even are any) my web bloat score will plummet even though it's the same exact page. This is precisely a key weakness of the metric I don't want to follow verbatim.

alystair commented 4 years ago

Has any additional thought been put into this since last year? The concept is pretty neat!

Isn't part of the metric that a picture of a website is x amount smaller than the actual website?

Yes, but there's all kinds of non-sense that goes along with this sensitivity to viewport size. If I change my resolution to something 4x larger but I've already hit the highest image breakpoints (if there even are any) my web bloat score will plummet even though it's the same exact page. This is precisely a key weakness of the metric I don't want to follow verbatim.

Aren't Lighthouse's testing resolutions (mobile/desktop) set in stone to ensure consistency?

connorjclark commented 4 years ago

Has any additional thought been put into this since last year? The concept is pretty neat!

AFAIK, nope

Aren't Lighthouse's testing resolutions (mobile/desktop) set in stone to ensure consistency?

Desktop viewport is dependent on the available screen space, we don't try to limit that. I don't think we've ever changed our mobile viewport, but are about to with 6.0.

alystair commented 4 years ago

Desktop viewport is dependent on the available screen space, we don't try to limit that. I don't think we've ever changed our mobile viewport, but are about to with 6.0.

Didn't think about that before! Maybe there should be a set of standardized desktop resolutions? ... gently stepping beyond the scope of this issue :^)

patrickhulce commented 4 years ago

Aren't Lighthouse's testing resolutions (mobile/desktop) set in stone to ensure consistency?

I wasn't trying to say that the web bloat score will suffer from variability due to changing screen sizes in Lighthouse. I was using that as another prong in the argument that web bloat score is not a real performance metric, it's a neat heuristic that summarizes people's impressions about "waste" on the web in an easy to consume number.

Lighthouse has spent a very long time advocating and refining around metrics that actually measure user perceived performance, and I question the value of this as a performance metric. As a best practices "respect users' data plan" metric, sounds great! As a perf diagnostic about "using bytes efficiently", let's do it! But a performance metric it is not, IMO.

Maybe there should be a set of standardized desktop resolutions?

There is an established desktop viewport when you select "desktop" in most environments, so I don't think this is as big of a concern. You can disable the emulation completely which I believe was the situation @connorjclark was referring to, but it's not as common and requires specific CLI flags.

https://github.com/GoogleChrome/lighthouse/blob/22e9783642822395884fb3ba5d5ba57f8901ce63/lighthouse-core/lib/emulation.js#L36-L39

alystair commented 4 years ago

Thanks for the rapid responses! 100% agree it's not a performance metric, I'd assume it'd land in the 'Best Practices' realm - but would it influence score? It'd have to come down to audit consistency/reliability.

GoogleChrome / lighthouse