Closed yohhaan closed 3 months ago
I meant https://ya.ru/ for automated tests
Good point @tunetheweb
The current plan is to get a list of domains that potentially have one of these 2 files through the HTTP archive crawl. Then, use a custom crawler (that I already have) to actually check if the detected JSON files are compliant with the expected JSON schemas, and then do some further analyses on the valid files.
I would be happy to do this parsing and JSON validation in the custom metric directly, but I would need to be able to call a JSON schema validator like the Ajv library and I am not sure how I would go about it. It is unclear to me if and how I can install further dependencies that these custom metrics would have access to (I am also working on another metric where this would be useful as I would need the Public Suffix list to extract the eTLD+1 of the hostnames).
@yohhaan I had an idea from the documentation that Chrome will consume this list of submitted and validated domains. So I thought it doesn't require additional scanning, no?
FYI Privacy Sandbox attestations implemented as a simple check within Privacy chapter PR, as part of privacy-sandbox
custom metric.
@max-ostapenko:
Related Website Set: this is indeed supposed to be the "canonical" list consumed by Chrome, but in practice some domains are not listed there... As an example: https://google.com/.well-known/related-website-set.json
I would like to get from the HTTP Archive crawl viewpoint which websites may have this file set, and then do further post-analysis to check if they are in that "canonical" list or no, if the file they host is exactly the same as the one published, etc.
Attestation: yes, I saw the privacy chapter PR (I am currently working on adding detection of other Privacy Sandbox APIs based on the proposed privacy-sandbox.js
metric by @Yash-Vekaria ).
Checking in well-known.js
and privacy-sandbox.js
would actually be complementary as I see it:
well-known.js
we check for if the file exists on the origin crawled (with the expectation that this file will not exist most of the time, except if we are crawling the homepage of an advertiser, etc., but we may discover it on unexpected websites)privacy-sandbox.js
, we check for the third parties that we detect calling the Privacy Sandbox APIs, if they have the attestation file set (as they should).
This pull request modifies the
well-known.js
custom metric to parse 2 well-known files related to Google's Privacy Sandbox:Test websites: