GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.49k stars 9.39k forks source link

Use third-party-web's definition for 3P entities #7474

Closed paulirish closed 4 years ago

paulirish commented 5 years ago

https://github.com/patrickhulce/third-party-web/blob/master/data/entities.json

We can use this for the 3rd party report filter and lightwallet/budgets.

Patrick said it accounts for 90% of JS execution on non-first-party origins, based on HTTPArchive data. 👍
(I imagine there may be a few blinds spots for requests that don't have JS execution.... ones that would show up in our Caching audit, e.g. We can take a look later)

patrickhulce commented 5 years ago

Just ran the numbers again, it's 88.22% of 3rd party script execution time. It's very top heavy so I've identified 99 entities that are 88.22% and the last 12% is spread across ~661 domains. If we need more coverage, I could keep going there 👍

paulirish commented 5 years ago

Just ran the numbers again

can you share your bigquery scripts?

If we need more coverage, I could keep going there 👍

90% coverage sg.. Though we're using script execution time as the coverage metric. What if instead we use frequency reported in uses-long-cache-ttl details as the coverage metric? nahmean?

patrickhulce commented 5 years ago

The bigquery scripts are all in https://github.com/patrickhulce/third-party-web/tree/master/sql :)

90% coverage sg.. Though we're using script execution time as the coverage metric. What if instead we use frequency reported in uses-long-cache-ttl details as the coverage metric? nahmean?

or network-requests ;) I believe HTTPArchive even keeps network payloads in a separate table that'd be even easier to aggregate.

paulirish commented 5 years ago

good call.

patrickhulce commented 5 years ago

I published https://www.npmjs.com/package/third-party-web#npm-module

It exposes getEntity which you can pass a URL and get back an entity object :)

I added a lot more entities based on the network requests query, but network requests are more spread out than script execution, so we have ~72% coverage at the moment. The top 50 entities get us 68% coverage, and the next 70 only get us 4% more...


3rd parties representing 48.83 % of total requests
120 Entities representing 71.71 % of 3rd party requests
Top 50 Entities representing 68.00 % of 3rd party requests
paulirish commented 5 years ago

(this is so awesome)

patrickhulce commented 5 years ago

Btw forgot to update here but I've also exposed slimmer version of the package.

You can do require('third-party-web/httparchive-nostats-subset') for example to get the version of the module with only the entities seen in HTTParchive without the usage stats (just entity information/domains/etc). This brings the total weight down to just about 55KB ungzipped which is a lot more tolerable.