antifraudcg / proposals

Proposals for the Anti-Fraud Community Group.
24 stars 5 forks source link

Fraud prevention trusted server #3

Open p-j-l opened 2 years ago

p-j-l commented 2 years ago

We propose developing a high-level document to capture use-cases and requirements for trustworthy anti-fraud servers. This is a call for collaboration among interested members of the community group.

Fraud detection and enforcement is one common use case that relies on third-party cookies and sensitive user data. There are more details about the need for this functionality in advertising use cases here. As browsers proceed to remove support for third-party cookies, this is an important use case that needs to continue to be supported.

One avenue to support this use case is the use of trustworthy servers to process relevant data. There are existing technical proposals that allow sensitive user data to be safely sent from the browser to a server, provided that there are guarantees for what the server does with the data (this is what makes it trustworthy). An example of this is the Aggregate Reporting Service that uses cross-site user data. Note that non-technical guarantees, such as auditing, are out of scope for this proposal at the moment.

We would like to use this discussion to then propose developing a similar scheme where the browser would send a minimum set of signals necessary for running ad fraud detection algorithms to one or more servers that could determine whether this is fraudulent traffic or not. One assumption we’re making here is that the algorithms are compute-heavy and therefore too costly to run in the browser itself (e.g. Machine Learning model evaluation) - we’d like to validate that first and determine whether it introduces a DoS risk. Another assumption is that attackers control the browser and so moving processing out of that environment may be beneficial.

There are many open questions in this area that we’d like to explore:

  1. What signals are necessary to get various quality results?
  2. What do ad fraud detection algorithms require to run beyond their input data? For example:
    • Are the algorithms proprietary and/or open source? Can these algorithms be publicly shared without compromising them or exposing their developers to reverse engineering risks?
    • How complex are the calculations that they perform?
  3. What sorts of technical protections best match the requirements above?
    • There are various useful things in this area: Secure Multi-Party Computation, Fully Homomorphic Encryption, Trusted Execution Environments, etc.
  4. Approximate ad fraud decisions can be made with a minimal set of signals, but how do we fine-tune what signals go into that set? Put another way: how do we experiment with new signals?
    • In practice, novel attacks can be entirely undetected if they are not covered by current signals and methods. Can system operators examine raw signals in some constrained setting in order to understand attacks?
  5. Who would own and operate the servers?

We’d like to start an effort to explore this approach, starting with requirements gathering, in the Anti-Fraud Community Group, and would welcome collaboration.

Related work:

darobin commented 2 years ago

I fully support documenting this, but I would like to ask that two things be treated separately: what the server needs to do, and how it can be trusted.

The reason I ask is because there are several proposals that rely in one way or another on a trusted server, and I think that we would benefit from trying to pool the "how it can be trusted" part, ideally alongside looking at what requirements might be managed in common. Otherwise we're going to end up with a whole menagerie of servers with different properties.

I have a proposal (in need of an update, coming) that was designed for PARAKEET-like things and the PATCG has been chatting about it in support of other proposals like IPA (possibly as a hybrid with MPC). I think we could all benefit from alignment.

p-j-l commented 2 years ago

That sounds good to me. I'd prefer to keep this discussion focused on what the server needs to do to start with and then get into the question of how later - alignment certainly sounds like a good goal for all of these.

On Tue, 15 Feb 2022 at 14:04, Robin Berjon @.***> wrote:

I fully support documenting this, but I would like to ask that two things be treated separately: what the server needs to do, and how it can be trusted.

The reason I ask is because there are several proposals that rely in one way or another on a trusted server, and I think that we would benefit from trying to pool the "how it can be trusted" part, ideally alongside looking at what requirements might be managed in common. Otherwise we're going to end up with a whole menagerie of servers with different properties.

I have a proposal https://darobin.github.io/garuda/ (in need of an update, coming) that was designed for PARAKEET-like things and the PATCG has been chatting about it in support of other proposals like IPA (possibly as a hybrid with MPC). I think we could all benefit from alignment.

— Reply to this email directly, view it on GitHub https://github.com/antifraudcg/proposals/issues/3#issuecomment-1040672497, or unsubscribe https://github.com/notifications/unsubscribe-auth/ART3FJNNB6RIGVF6NKEIBITU3KPUBANCNFSM5OBDVE3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

darobin commented 2 years ago

On 2022-02-15 15:15, pjl-google wrote:

That sounds good to me. I'd prefer to keep this discussion focused on what the server needs to do to start with and then get into the question of how later - alignment certainly sounds like a good goal for all of these.

Right — in case I wasn't clear, that's the split that I think works the best: this group figures out what the requirements are for the server and temporarily ignores issues as to how to make the server trustworthy. Then, when the requirements are agreed upon, let's see if we can share some infrastructure across some or all cases.

-- Robin Berjon VP Data Governance The New York Times Company

chris-wood commented 2 years ago

@pjl-google before putting together any sort of requirements for a trusted server, I think we should get clarity on the problem here, and in particular what types of signals might be useful in addressing that problem. What do you think?

p-j-l commented 2 years ago

@chris-wood sounds good, I think the signals needed will be a key part of the requirements. I also haven't really had a chance to get into this work yet and I'm looking to come back to it soon.

dvorak42 commented 2 years ago

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

dvorak42 commented 2 years ago

From the brief discussion of this proposal, there was some interest in trying to nail down specific signals/capabilities these servers could have that would be useful in the anti-fraud space. There was also interest in seeing how this tech was used in other APIs in the W3C and what anti-fraud problems arise out of that.

It would be good to nail down specific instances of these sorts of signal and having a side meeting/discussion on those to then bring that to another CG meeting.

There was also some interest in potential uses for this in the device score space (#16 )