Open alexnj opened 2 years ago
For the "closing the loop" section, I did an HTTP Archive query based on the LHRs rather than the raw requests
to get a sense of what kind of reports we would likely see when enabling those links.
The query looked for cross-origin network requests identified by the protocol as resourceType
Script
(with mainThreadTime
for each script joined if available in thebootup-time
audit), then eliminated any that have a known third-party-web entity as of yesterday's 0.20.2 release.
This listing then groups by domain and requires at least 50 occurrences, but we could set other thresholds (e.g. blocking time, transfer size) or group by NET.REG_DOMAIN
or TLD+x or whatever
2022_09_mobile: third-party origins not covered by third-party-web
(note that mainThreadTime is floored to 0 if < 50ms by the bootup-time
audit, and "median" is actually "median of medians" just to give a ballpark)
Definitely at least a lot of CDNs associated with particular sites in the top results that could be added (e.g. static.wikia.nocookie.net
with fandom wikis, quoracdn.net
for quora subdomains).
Looping in @patrickhulce
Love the concept to group cross-origin domain work in this audit and provide prompt to file with third-party-web ❤️ 😃
Manual whack-a-mole worked quite well when I had bandwidth to run and investigate every month but web changes fast :)
Mostly complete. The remaining bit is under "Closing the loop with third-party-web"
Lighthouse Third Party Summary audit currently drops third-party origins that don't have an entity match in third-party-web library. This causes several third-party origins that have either high transfer size, or main thread blocking (or both) to be dropped out from Third-party summary audit as a side effect. For example, if we audit
theverge.com
homepage,blismedia.com
,narrativ.com
,vercel-insights.com
are not part of the third-party summary audit. Numerous (newer) TikTok CDNs also drop out despite their large transfer size, as the data-set is not updated with the recent changes the origin has made. There is a clear process defined bythird-party-web
to update this data, as a manual operation. This is a disconnected process and perhaps could be improved upon if we model it as a feedback loop.Granted this issue is not severe when the same asset is detected elsewhere for transfer size or execution time in other audits that focus on those. Thus from an issue-detection perspective, the user might find the third-party impact in other audits. However from a proper third-party classification perspective, this could be improved.
To show an example of a third-party that does block main thread quite long, but gets excluded from third-party summary, here's a test-case:
mtb-thirdparty.surge.sh
mtb-thirdparty.surge.sh
blocks main thread for 2000+ ms.Currently, this is what the summary audit of this test-case origin would produce (and it passes):
where it should've been the following (and fail due to high blocking time):
Proposal
I think one option we could pursue is to make up a third-party entity based on the root level domain, and not drop unrecognized third-party domains. This should be fairly straightforward by maintaining an in-memory lookup table of entities recognized during the audit, while maintaining compatibility with the
IEntity
interface exposed by third-party-web.The drawback with this approach is the duplication of already-recognized entities with their new, unrecognized origins. TikTok is an example of a recognized entity, and a duplicate entry would be created for a CDN host that's not recognized (example today:
ttwstatic.com
,tiktokcdn-us.com
, etc.). This could be improved further as below:Closing the loop with third-party-web
One option to reintegrate these unrecognized entities is to help the user contribute back to third-party-web. We could mark up the unrecognized links in the report, as below:
We could use GitHub issue-creation link to automatically fill in the required title and meta data.