Open krgovind opened 3 years ago
It should be possible for anyone to run a crawler to figure out if a set is valid. (I would be interested in collaborating on a crawler project to produce a validation tool and directory of sets. If anyone else is working on a crawler/validator/directory for first party sets, I would appreciate a link.) Three categories of items that a crawler could check:
A set could define a list of resources under .well-known
that are required to be identical across site members. For example, a policy could require that if /.well-known/gpc.json
is present on a site, then only sites with an identical resource at that path would be eligible to be a member of a first-party set.
Other resources outside of /.well-known
can also be compared to determine validity of a set if present and identical. For example, if both site A and site B have an /ads.txt
and the content does not match, then that is evidence that A and B are administered separately for purposes of some business relationships, and therefore not members of a valid set with each other.
Other items clearly need to be common across sites from the user point of view, but are more complicated to check. For example, the privacy policies for two members of a set could be identical in text, but different in content because of different styling. Privacy policies would have to use markup to facilitate comparison.
Common branding resources and guidelines are also clearly necessary, so that a user is aware when they are using sites that share a set. This might include a common set of graphic elements and size at which the elements must be visible -- but there are a11y concerns. We would need to be confident that a user of assistive technologies will be able to recognize when they are using sites that are members of the same set.
For example, if both site A and site B have an /ads.txt and the content does not match, then that is evidence that A and B are administered separately for purposes of some business relationships, and therefore not members of a valid set with each other.
My opinion is that using ads.txt
content as a proxy for "owning entity" is probably not a good fit, as there may be reasons for different sites owned by the same publisher to have different ads.txt
files, if the sites serve different purposes. Using ads.txt
this way also seems to imply some connection between First-Party Sets and ads use-cases, which would be unfortunate since First-Party Sets is not connected to ads use cases.
There are also good reasons for sites owned by the same publishing group not to be eligible for the same first-party set. Whether or not two sites can reasonably be parts of a first-party set is more about user-visible branding and expectations of data handling than about ownership structure. (For example, two independently owned radio station sites that are part of the same network and run the same news and talk shows might be part of the same first-party set, but a scientific journal and a local news site that are two divisions of the same corporation might not be.)
An crawler could reasonably produce two results from comparing two ads.txt
files: either
these two sites have data sharing relationships that are different enough that they could not be a first-party set
Common response headers could also help automatic verification of FPS members. A common Permisions-Policy and Consent-Security-Policy should not be too hard to arrange, not only to show common ownership, but also encourage good cross-site security practice. Perhaps it could be tightened further by restricting wildcard strings (*) in allow lists.
Other data points (to help automatic verification) could be:
"ownerName": "Example-Company Inc.", "indicatesWith": [{"DNS": ""}, {"X.509-Subject":"CN"}, {"WHOIS":"registrant"}], "owner": "example.com", "members":[ "member-one.com", "example.eu" ],
"indicatesWith" is an array of objects to make it possible to identify the particular record.
Browsers/regulators could specify how many and what data points would be necessary to verify a valid set. Technical documents like this are machine readable, but could also eventually be seen as a legal declaration of identity/ownership of domain origins.
@krgovind and others, way late here: in assessing various options here, was anything considered in which the browser would put something on the screen from a ./well-known resource that would "enforce visual co-branding"? Something that would "extend" the browser bar, like:
Even something that was "obtrusive" that allowed for greater flexibility and decentralization might be preferable for some businesses. With the new RWS Subsets concept, something like this could define a type of subset, and browsers might make different choices about what to allow in terms of storage/network access for those subsets (SAA auto-grants maybe, but other options: maybe Topics API considers all the sites in this type of set if the API is called on one of them, or Interest Group TTL can be reset based on a visit to one of the sites).
@thegreatfatzby There is a suggestion to check that some common branding element is present in the DOM: https://github.com/WICG/first-party-sets/issues/95 (There are probably some good browser-based software testing tools that could be repurposed to check that a specific element is present and viewable, or this might be a good use case for machine vision: render the page in a headless browser and check for common branding elements)
The challenge here is a11y though: is the common party or context clear to users who are visiting the site using a variety of assistive technologies?
@dmarti thanks for the info:
Think I get the above proposals, but those would not involve the browser actually placing the branding/links/something on the page to "enforce co-branding", right? They would be checking rather than injecting something. I'm thinking something like:
This would be more obtrusive but allowing businesses to make their own choices about site structure with enforced branding might be preferable to choices about your own branding but enforced site structure.
This is an area I'll go dig on, but in the meantime can you help me understand the issue? I'm trying to think through what A11Y cases would be marginally worse (marginal in the economic sense, not size sense) in the case of an additional visual element used to indicate privacy scope.
@erik-anderson suggested over on the TAG review thread that we consider technical mechanisms in lieu of the "UA Policy" to verify formation of acceptable sets.
The proposal currently calls for a "UA Policy" (relevant issue) to ensure that site-declared sets meet acceptance criteria. This was added to the proposal primarily to address feedback received from Safari (#6) and Mozilla (#7):
Is there a combination of technical mechanisms; along with a revocation mechanism, transparency logs to aid auditability, etc. that could address these concerns?