Anti-Fraud Safelist - Githubissues

samuel-t-jackson commented 2 years ago

Balancing Privacy and Security with an Anti-Fraud Safelist

Background

Today’s fraud controls are a societal good and a necessary part of our global economy. At the same time, internet users have a right to privacy, and thus deserve consent-driven tools that prevent abusive or unwanted tracking on the Internet.

The goal of this proposal is to provide a consent-driven framework through which fraud-detection stakeholders may continue to capture signals that have great value during identity verification and authentication. The proposal stems from a concern that other privacy-centric proposals under consideration by the W3C will have a substantial negative impact on controls that keep consumers, organizations & governments safe from fraud.

Specific Concerns Regarding Proposed Privacy Measures

Google’s Privacy Sandbox represents some of the most prominent recent proposals concerning browser privacy. Collectively, these proposals would have a great impact on existing fraud controls. For example, were proposals concerning device entropy and IP masking to be implemented in some fashion, we expect to see the following degradation in fraud controls:

Limited ability to profile and stop fraud rings during organized attacks
Inability to identify IP concealment schemes, such as TOR, proxy servers, or malicious VPN activity
Limited ability to identify foreign actors or criminal networks
Elimination of signals used to identify bots or traffic from hosting facilities
Reduced capabilities around anomaly detection during verification and authentication
Limited capabilities around re-identification of customer devices for risk based authentication
Greatly reduced utility of IP geolocation as a signal in fraud prevention

We believe that it is essential that these new proposals be implemented with additional measures to ensure that fraud controls remain effective.

Proposal - Anti-Fraud Safelist

We are in favor of carve-outs or exemptions to certain browser privacy controls for companies and organizations with a focus on fraud-detection or account security. Criteria for certification are discussed later in this document.

We propose that browsers maintain safelists consisting of certified parties who meet rigorous ethical standards. Browser and IP data would be visible to certified parties, but only in high-risk contexts. Users would then have the ability to add or remove organizations from the safelist via settings in their browsers.

These high-risk contexts would be identified in HTML markup, and the specific fraud-detection organization would be tagged accordingly in order to enable a handshake between the browser and their services.

This approach would enable high-trust interactions, but only in risky contexts where a user is identifying themselves to a service provider. The approach would respect consent and protect against ubiquitous surveillance mechanisms.

HTML Markup for High Risk Contexts

In order to honor the intent of browser privacy proposals, we advocate that device and IP details only be revealed to 3rd parties in limited high risk contexts. High risk contexts represent a small minority of day-to-day interactions on the internet and include situations where a user is:

Identifying themselves for the first time, in order to become a user of a platform or service
Initiating a monetary transaction, for example, a funds transfer or eCommerce checkout
Authenticating themselves at the start of a session or during step-up authentication
Changing or updating account information, for example, initiating a password-reset or update to a mailing address or phone number
Adding another user to an account, for example a secondary user of a credit card
Claiming benefits from a government

For such a scheme to work, site owners would need a mechanism to declare that a particular page or interaction is high risk. Such a declaration would ideally include:

A declaration of purpose (e.g. ‘Auth’, ‘Verify’, ‘Transact’)
A certified authority (e.g. anti-fraud solution provider) that is present on the safelist
A scope in which data collection could occur. For example: -- A URL corresponding to an endpoint managed by the authority -- An iFrame on the authority’s domain

The markup itself could take many forms, from a new type of tag to a declaration within a meta tag. This initial proposal does not cover the semantics of such markup.

Organizations employing device-centric fraud controls typically have three options for integration with a web site:

Client side javascript libraries
Server side inspection via a pixel or similar interaction prompted by a GET request
Implementation of a reverse proxy, potentially in conjunction with 1 and 2

This proposal as articulated would serve options 1 and 2, but not 3. This is intentional as a reverse proxy managed by a 3rd party service is omnipresent during web sessions and not observable by an end user. As such, scope for data collection would be ambiguous, or worse, universal within a session. That said, there is no reason why an organization employing a reverse proxy couldn’t use it in conjunction with options 1 and 2, meaning that they would still be in a position to collect relevant device and IP details, but only in high risk contexts.

Certification Process for Fraud Detection Organizations

As it stands, most browsers ship with built in SSL certificates corresponding to verified Certificate Authorities. We propose that a similar practice be implemented in relation to the fraud detection safelist. Browser manufacturers would define their own process for diligence, which presumably would require participant organizations to complete annual audits and sign corresponding affidavits.

While a public standard for such audits is out of scope for this document, we propose the following criteria for certification:

The candidate organization should exclusively be involved in Fraud Detection or Account Security (this is to avoid conflicts of interest with Martech/Adtech)
Collected data is exclusively used for Fraud Detection or Account Security.
Organization meets rigorous ethical standards [TBD]
Organization meets rigorous standards for data security [TBD]
Organizations are prohibited from selling data collected in these contexts to outside parties

As further context, it is worth considering that a number of large organizations are simultaneously active in Advertising and Fraud Detection, and utilize the same underlying device observations across their various solutions. This type of behavior is antithetical to the intent of this proposal, which assumes that user consent and preferences are directed toward security and fraud controls. The above criteria are intended to prohibit those sorts of activities, as well as large scale surveillance operations.

In future phases, companies that do not have an exclusive focus on fraud detection & account security may participate in the program. This will require enforcement of further standards governing the complete separation of data and systems from processes that do not exclusively support fraud prevention & identity assurance.

Should this proposal move forward, we hope that a neutral organization with a history of drafting standards for certification would set the baseline for the review process in the form of a public standard. Such a standard could be authored by W3C members, or another credible organization like the FIDO Alliance or The Kantara Initiative.

Additional Anti-Abuse Measures

While the certification process outlined in this proposal is intended to ensure that bad-actors do not participate in the anti-fraud safelist, there is still the potential for a certified party to overstep the intent of this scheme by labeling large portions of a website as ‘high risk’, and thus enabling unwanted data collection on a broad basis. This could be further exacerbated if such a scheme was implemented across multiple web properties in order to enable large scale tracking.

While we don’t prescribe a particular approach to tackle this problem, we do envision that browsers could implement further anti-abuse measures to stem these concerns, including:

Contextual analysis of web pages and traffic patterns to understand if user interactions and DOM elements are typical of sign up or authentication flows, or if the majority of the web property is oriented towards a high risk service (e.g. payments platforms or federated authentication)
Analysis of anonymized web traffic in order to identify the proportion of a given web property that is tagged as ‘high risk’
Analysis of cross domain web traffic to identify collusion between anti-fraud organizations and other parties, for example adtech platforms

Implications for Users if No Exceptions Exist for Fraud Controls

If institutions are not able to verify the identities of applicants in high-risk circumstances, this will lead to a number of negative consequences for consumers. Generally, more data will be gathered. By trying to avoid less invasive profiling, users end up having to provide even more information.

The following are some examples of challenges that users face when passive fraud controls fail to verify an identity:

Automated collection of documentation and biometrics (e.g. driver’s license + selfie + liveness)
Collection of corroborating evidence of an identity (e.g. utility bills, proof of residence, SSN card, SSA 89 form)
Phone calls to the applicant or the family of the applicant asking if the applicant actually initiated a given interaction [a somewhat common practice amongst card issuers]
In-person authentication
A requirement to see an in-person notary
A requirement to go to a branch office and show (often) 2 forms of ID
Online notary, which would require both document verification, forms, interview & attestation

User Consent Models

We propose that user consent options follow the following design pattern:

Browsers display a visual notification when a user encounters a high risk context. The UI prompt should enable the user to see verbose details about the context (perhaps on hover). Details would include:

A link to information describing the particular types of data that may be collected
A declaration of purpose for the data collection (e.g. ‘Authentication’, ‘ID Verification’, ‘Transaction Security’)
The name of the organization that is collecting the information and a link to more details about their operations and mission
A link to consent preferences, where users can add or remove organizations from the safelist, or opt into an ad hoc consent process

Users who have opted into ad-hoc consent will be prompted to approve or deny requests for added information on a case by case basis

By default, users are opted into a safelist according to the default preferences that ship with their browser. We prefer this model to universal capture of ad hoc consent, which we fear would disincentivize participation, which would in turn push users into more invasive workflows.

As an added note, we have spoken to security professionals who are of the opinion that consent options will undermine fraud use cases as criminals will opt out, as will privacy-minded consumers. It is easy to imagine an alternate form of this proposal where the safelist is not configurable by users, and certainly that approach has some advantages in terms of security for organizations on the web.

dmarti commented 2 years ago

This is promising. One suggestion would be to make it clear that within a high risk context, the actual transaction info and the safelisted (anti-fraud) data collection are the only permissible data collection. For example, a site would not be able to declare a page as a high risk context if that page also ran conversion tracking or attribution tracking for marketing purposes.

SpaceGnome commented 2 years ago

Regarding IP masking to be implemented in some fashion - isn't this addressed through Chrome's proposal on https://github.com/bslassey/ip-blindness and https://github.com/bslassey/ip-blindness/blob/master/willful_ip_blindness.md ?

npdoty commented 2 years ago

I'm not sure there's broad appetite to expand or duplicate the Certificate Authority system and this would be a lot broader and more intensive than just distributing key material. That may not be a reason to stop developing this idea, but I just think we should be aware of the potential challenges there.

This would also be a very different solution space from ad fraud related measures, presumably. Some signals can be provided to the user agent and indeed exposed to the user, that the user would want to consent to more data collection because they're doing something especially intensive, but that would need to be a different class from approaches for advertising which we expect to appear on many pages.

samuel-t-jackson commented 2 years ago

This is promising. One suggestion would be to make it clear that within a high risk context, the actual transaction info and the safelisted (anti-fraud) data collection are the only permissible data collection. For example, a site would not be able to declare a page as a high risk context if that page also ran conversion tracking or attribution tracking for marketing purposes.

I agree, and I attempted to address this to some extent with the audit requirements for the organization but further clarification is warranted. Conversion tracking or similar cross site attribution are absolutely counter to the intent of this mechanism.

samuel-t-jackson commented 2 years ago

Regarding IP masking to be implemented in some fashion - isn't this addressed through Chrome's proposal on https://github.com/bslassey/ip-blindness and https://github.com/bslassey/ip-blindness/blob/master/willful_ip_blindness.md ?

Unless I'm mistaken, there isn't yet absolute clarity on how these proposals will be adopted, or whether or not they will undergo further revisions. Furthermore, the fact that they allude to the privacy budget, which is a separate and distinct proposal, adds ambiguity to how all of this will work in practice. Given the severe disruption to fraud controls that may result (depending on final implementation), I think that it's best to take a conservative approach and assume that IP address may no longer be available at some point in the near future, and design anti-fraud mechanisms with this in mind.

samuel-t-jackson commented 2 years ago

I'm not sure there's broad appetite to expand or duplicate the Certificate Authority system and this would be a lot broader and more intensive than just distributing key material.

This would also be a very different solution space from ad fraud related measures, presumably.

Agree on both counts, but I also don't see great alternatives to some sort of audit mechanism. The gnatcatcher proposal also alludes to audit requirements. I didn't cite CA authorities because their onboarding process is popular; I was merely citing them as existing precedence for similar mechanisms.

bmayd commented 2 years ago

While the certification process outlined in this proposal is intended to ensure that bad-actors do not participate in the anti-fraud safelist, there is still the potential for a certified party to overstep the intent of this scheme by labeling large portions of a website as ‘high risk’, and thus enabling unwanted data collection on a broad basis. This could be further exacerbated if such a scheme was implemented across multiple web properties in order to enable large scale tracking.

A possibility for addressing this would be for to require that high-risk interactions include 2 certified parties, one acting as primary and the second acting as auditor.

Conversion tracking or similar cross site attribution are absolutely counter to the intent of this mechanism.

While it seems prudent to minimize the use of these mechanisms and prioritize high-risk contexts, it may also be worth considering a version for high-value contexts. For example, conversion tracking and attribution are a critical, high-value aspect of digital advertising and as such are of interest to those perpetrating ad fraud. A version of this mechanism which allowed carve-outs for high-value contexts, like an attribution event, that gave a certified party and an auditor extra information with which to screen for fraud seems like a good means for deterring fraud while safeguarding privacy.

samuel-t-jackson commented 2 years ago

Interesting points.

For example, conversion tracking and attribution are a critical, high-value aspect of digital advertising and as such are of interest to those perpetrating ad fraud.

In this case, would an attribution event consist of a purchase event, or click through, or a range of things?

bmayd commented 2 years ago

A conversion event indicates that a particular marketing goal has been met and it can be anything an advertiser is trying to drive; a purchase is a good example. The conversion event is signaled through a page element -- generally a "conversion pixel." When a conversion event is "fired" an attempt is made to match it to an ad that was clicked on, or just viewed, based on various rules and if a match is made, credit for the conversion is attributed to the ad's campaign. Generating fake conversion events can make ads never seen by users look like they are driving results.

bmayd commented 2 years ago

@samuel-t-jackson This section on Advertiser Needs from the Improving Web Advertising use-cases document might be of interest.

samuel-t-jackson commented 2 years ago

@bmayd thank you.

A conversion event indicates a particular marketing goal has been met and it can be anything an advertiser is trying to drive

My gut level reaction is that conversion events would need to be more narrowly construed to be a good fit for this proposal. With the above definition, a simple click-through or something similarly commonplace would constitute a conversion event. This proposal is intended to thwart the sort of pervasive, internet scale device tracking that is common in adtech, while giving users control over more specialized device tracking, like fingerprinting for authentication, via a configurable safelist or consent options.

Maybe we could propose a simple test to validate whether a use case (or conversion event) is aligned with the user's privacy interests- "is there an expectation on behalf of the user that they will consensually reveal their identity by taking a given action?". By this definition, checkout, login and signup events would all be understood as contexts that justify additional data collection, but UI driven engagement like click-throughs would not

I think there are other concerns here, for example that advertisers would still abuse this for cross site tracking (for example by tying conversion events to 1st party cookies across multiple domains), and that this needs some careful consideration. Conversion events were not on my mind when I wrote this proposal (my current occupation is oriented towards identity and financial fraud).

This is a worthwhile topic of conversation for the working group, and I think it aligns well with the review of use cases.

bmayd commented 2 years ago

@samuel-t-jackson I think I owe you a bit more clarity regarding my suggestion.

The sort of carveout I had in mind was a page element that could be included as part of a post-conversion event (e.g. on an order confirmation or "Thank You" page) and which would ask the browser to provide specific precise information to trusted end-points for the purpose of assuring the conversion event was valid.

As a simple example: a carveout with a declared "conversion" purpose could ask the browser processing a conversion-pixel to communicate directly with the conversion-endpoint rather than via a proxy, so that the conversion-endpoint had access to the IP address of the browser, but the endpoints for the rest of the page elements didn't.

The trusted endpoint could be a trusted server as described in various privacy-preserving conversion attribution proposals which would record the conversion event and data to validate its authenticity, but report only minimal attribution data and a "valid conversion" confidence score.

samuel-t-jackson commented 2 years ago

Makes sense. Do you imagine a gradient of information that becomes available depending on the use case, or is it all-or-nothing?

bmayd commented 2 years ago

A gradient; I think information provided should be proportional to the value of the interaction.

dvorak42 commented 1 year ago

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

samuel-t-jackson commented 1 year ago

Sure, here's a brief write up:

Browser manufacturers maintain safelists of certified parties (much like Certificate Authorities), who would be enabled to view unobfuscated Browser and IP details in certain circumstances (‘high risk contexts’)
Certified parties would typically be focused on Fraud Detection or Account Security, and would undergo audits/sign affidavits guaranteeing that data collected is only exclusively used for Fraud Detection or Account Security.
High risk contexts would be identified in HTML markup, and would enable higher entropy/unobfuscated data collection from the browser by a named certified party
Browsers would be encourage to provide consent mechanisms. Presumably, some users might want the option to grant consent on a case by case basis via a UI prompt, while others would prefer to suppress prompts and just manage the safelist.

npdoty commented 1 year ago

Attestations from sites about exact uses of data, backed up with audits and promises that can be enforced by regulators, could be a useful signal, separate from a certificate authority system or a safelist system.

samuel-t-jackson commented 1 year ago

Completely agree, worth exploring further.

dvorak42 commented 1 year ago

From the brief CG discussion, the scalability of this might be hard for browsers/UAs. Are there examples of similar governance structures employed by browsers/UAs that have similar properties. (for certificate authorities, it might be good to get some feedback from browsers about the challenges they face there). Echoing Nick, there might also be a subpiece of this regarding the sites reporting that might be good to consider separately.

antifraudcg / proposals

Anti-Fraud Safelist #4