FLEDGE: Parity among players, Accountability, Transparency, and Fraud Detection

WICG / turtledove

TURTLEDOVE

https://wicg.github.io/turtledove/

Other

542 stars 240 forks source link

FLEDGE: Parity among players, Accountability, Transparency, and Fraud Detection #114

Open TheMaskMaker opened 3 years ago

TheMaskMaker commented 3 years ago

The present Fledge proposal treats DSPs and SSPs as first class citizens while penalizing other players.

The present proposal also imposes a heavy limit on the ability of any party to hold dsp/ssps accountable and check for fraud. Current publishers (or third parties hired by the publishers) using Prebid for example can sum the total bids on a per-event level and compare the total in a given timeframe to the amount the dsp or ssp claims is owed. The Fledge proposal only allows DSPs and SSPs to have this information.

The question of event or aggregate information is irrelevant (for the moment), the issue is that the DSPs and SSPs monopolize a stream of data that in the interest of transparency should be available to other trusted parties and players.

Trust is insufficient, and publishers often have little recourse to evaluate dsp/ssp systems and even in the current environment when this bid data is only sometimes available there are inconsistencies and issues. Fledge would worsen this problem. Publisher’s can’t merely use dsps/ssps ‘they trust’ because they cannot establish trust having no way to ensure they are not being cheated. Trust is gained by repeated comparison of auction information in which both sides have similar numbers and can explain any discrepancies. In any other monetary exchange the party owed money is able to determine how much without relying on the party paying.

The publisher, or a publisher trusted agent, should have access to the SAME information the dsps and ssps have. There is no reason to assume that the dsp/ssp would be more trustworthy than a publisher or publisher aid company. Any restriction placed on the data should affect all parties equally and not favor dsps/ssps in particular.

The event based bid data is labeled temporary in the current model, but while it exists, it does not make sense to restrict it to some parties and not others.

Additionally any aggregate api the dsp/ssp system can access the publisher/publisher trusted agent should also be able to access.

It is not enough to consider parity in the long run, or publishers and publisher aid companies, as well as other players will be negatively impacted in the first version as well as placed at a significant disadvantage in preparation. The first version of fledge should have parity among players, and then fixed any trusted api issues at the same time for everyone.

What is important here is parity. This ensures no one party can use a loophole to expose a privacy weakness, or that only one party can monopolize a part of the data or system. Fledge should expose EVERY event seen by another party to a publisher or publisher trusted agent. A publisher trusted agent can be established using the delegation system described elsewhere in the Fledge proposal. This is essential for many small publishers that rely on non-dsp/ssp businesses for their ad maintenance and analytics, as well as those partners themselves.

michaelkleber commented 3 years ago

FLEDGE's on-device auction was designed around two types of parties: one seller who runs the auction, and any number of buyers who bid in it. As the explainer says, "Many parties might act as sellers: a site might run its own ad auction, or might include a third-party script to run the auction for it, or might use an SSP that combines running an on-device auction with other server-side ad auction activities." So I'm definitely not trying to say that the publisher must use an SSP; people who operate websites should have lots of freedom here.

But it sounds like you don't think that's enough, and that the auction needs some special third participant, an additional verifier agent to watch what the seller does? I don't fully understand how this verifier role would work today in a server-side auction — surely there is some party who receives all the bids and acts on them. What pieces of information are you trying to get another party to see?

(Your answer might be, as in your original issue, that "Fledge should expose EVERY event seen by another party to a publisher or publisher trusted agent." But the whose point of FLEDGE is that publisher is emphatically not supposed to know what interest groups their visitors are in! So I certainly can't interpret what you're asking for literally.)

TheMaskMaker commented 3 years ago

There is another party to consider: The Publisher Trusted Agent. Publisher Trusted Agents should have equal access to business necessary auction data even if an SSP is ultimately running the auction, as they do in a large part today.

Publishers may want to use 3p providers for use cases such as audit, attribution, revenue analytics, fraud detection, and ad content filtering.

These goals are (pre-fledge) achieved today through event level access of certain data. They need a way to exist in a fledge world. I’ll list key data points below. Under fledge this data is still being sent to "trusted servers" and could also be sent, without leaking the information to the client, by using the exact same "trusted server" paradigm for the 3p provider.

Data used by such companies for these purposes today include: Creative IDs CPM of the bids Advertiser Advertiser Domain Bidder Name SupplyChain

Can you confirm that a publisher will be able to specify a service as a trusted server agent as part of fledge? Will this trusted agent server receive the same auction data (some business critical examples listed above) as the other players in the Fledge ecosystem, including and especially a selling ssp?

michaelkleber commented 3 years ago

Perhaps there is some misunderstanding here. The data that you list is all stuff that becomes known to the party running the auction (the SSP or whoever else the publisher choses) — but that information only becomes known to the on-device worklet that runs the auction. It is not something that anyone gets to send to a server. I understand this is a change from today, but it's a change for all parties.

The temporary event-level logging (5.1 Seller Reporting on Render) is only an opportunity to log information about the winner. It doesn't seem like that's what you're asking about.

TheMaskMaker commented 3 years ago

I agree, I think we are getting closer in understanding, but not quite there yet. The data listed needs to be known to more than the party running the auction, and it can be made known without violating fledge’s privacy concerns. Let me get into more specific business use cases:

The creative id (or the render fingerprint if that can be used as a creative id for this purpose) is needed for the winning bid, not just by the seller, but also by publishers or their trusted agents, to track and debug undesirable creatives and advertisers. It would be ideal if the publisher could get this directly from the auction winning logic, and not from the seller’s worklet, to ensure this cannot be faked or hidden by errors (if I am understanding the auction mechanism correctly).

For the CPM, I understand you do not want to leak this to the client to avoid user association, but publishers or trusted agents need the exact CPM to be sent to a trusted server (just as the seller requires this in fledge) for several key business use cases relating to the real world monetary value it represents and reconciliation reporting. It can be used to ensure the total amount transacted is correct and to check for fraud.

The cpm of bids that did not win are also needed for additional use cases: These are used by publishers or publisher trusted agents to understand bid density and the activity of partners. They can be used in desirability score metrics, and used in trouble shooting issues with partners and to help understand how different “seller” (ssp etc) are performing or applying their “desirability scoring”. They can be used to determine which/what partners are active, and latency of the bids also needed for this purpose (I believe an issue was already filed for the latency use case https://github.com/WICG/turtledove/issues/90 ).

These are current publisher needs. Can these needs be met by allowing the publisher or its agent to have a worklet that receives this data from the browser and is able to send it to a trusted server?

The other data points I listed have concerns as well, but lets start with these 2 to avoid a text wall.

michaelkleber commented 3 years ago

I'm not sure what you mean by "receives this data from the browser and is able to send it to a trusted server". What is this server being trusted to do, or not to do?

Receiving this sort of data at the event-by-event level, rather than in aggregate, seems like it would break the privacy guarantees that FLEDGE is trying to make. If you are talking about aggregated data, then this is indeed the style of reporting that FLEDGE supports.

If there is a trust issue such that, for example, the publisher is willing to let a third-party seller pick which ad should render, but not trust them to issue an (aggregated) report of all the ads they picked, then I can see how some additional reporting tied directly to things happening in the on-device auction might help.

TheMaskMaker commented 3 years ago

Based on the conversation on the meeting on March 17, I am speccing the details for a publisher trusted agent worklet. The publisher can establish their own worklet or designate a party (like an RMP or analytics company) to create one on their behalf. This worklet will serve to gather auction data the publisher needs to make decisions with respect to its bidding partners. It will also enable transparency into the auction, and analytics on the auction data within the confines of fledge.

To briefly reiterate the points from the meeting so a reader of this issue does not have to go look for the minutes:

A publisher needs a method to reconcile revenue earned at the end of each month, and needs its own book to confirm. It is common in the industry (Basile gave an example) for one party to have a server error so the publisher needs its own copy of the cpm revenue data for revenue reconciliation. This also promotes transparency and is used in fraud detection.
A publisher needs access to information about the auction itself. Many common auction formats today (prebidjs) offer this insight. A publisher can use aggregate insights in fledge to make the same decisions. Information on the activity of the bidders (detailed below) can be used to identify useful bidding partners, or partners that should be dropped. It can be used to confirm that partners are abiding by contractual agreements. It can be used to debug and detect unwanted, inappropriate, or undesirable ads and ensure they are removed. It can be used to ensure quality ad chain. Bidders that lag out the page, break the auction, or stall the auction can be detected and removed.

I will try to list the main data points needed for these purposes, as well as data they need to be correlated with. It is assumed that encrypted packets can be collected and sent to the aggregate api.

The CPM of the winning bid, correlated to the url, seller,bidder, advertiser The CPM of all bids (including ‘nobids’ for partners that choose not to participate), correlated to the url, seller, bidder, advertiser The latency of all bids (https://github.com/WICG/turtledove/issues/90) correlated to the url, seller, bidder, advertiser The creative id (or ad fingerprint) correlated to, url, seller, bidder, advertiser The supply chain object (https://github.com/InteractiveAdvertisingBureau/openrtb/blob/master/supplychainobject.md)

Additionally, it would be useful to have the ability to obtain the winning interest group Additionally it would be useful to have mechanism for detecting that a bidder or seller has ‘errored’ preventing a bid or preventing the auction.

michaelkleber commented 3 years ago

Most of what you say sounds like what we discussed on the call, but I'm surprised by "correlated to the url". Are you thinking only of common URLs?

We talked about the fact that the information you're discussing should be available in aggregate but not on an event-by-event basis. Indeed the more finely you try to slice information, the more you will run into the Differential Privacy noise added by the aggregation system.

What use cases are you trying to address by slicing at the URL level?

TheMaskMaker commented 3 years ago

The current business usecase is to associate the information collected to the web page. We need to correlate the business interest and revenue potential based on the content. There is no information in fledge on an individual user but there is aggregate information on a web page. This helps us identify the content that is valuable to advertisers to maximize revenue for the publisher.

michaelkleber commented 3 years ago

Sure, I get that, but that only makes sense if the URL is designed to allow aggregation. Many sites' URLs include lots of parameters that make them unique to a given visit, which means grouping by URL would leave you with only noise, no data.

Perhaps it should be the job of the publisher-trusted agent to look at the real page URL and somehow turn it into an aggregation key. If the URL is sufficiently popular across different people, you could just use that, but if the URL has all sorts of parameters that make it unsuitable for aggregation, the publisher-trusted agent would implement logic to strip parts away, or even replace the URL with a higher-level "section of the publisher site" kind of descriptor, to make useful segments for slicing.

TheMaskMaker commented 3 years ago

I agree, in fact I think it makes allot of sense for the publisher trusted agent, as they know the url structure, to break it down to something aggregate and useable. That would work perfectly.

In fact since you mention it, I wanted to asked about the ability to aggregate on other contextual information as well. Is this possible as long as the numbers are high enough to enable aggregate lookup and prevent individual tracking? For example, correlation to article author is something important to publishers. Correlation to external Referrer domain (50% of your audience comes from google search for example) is also useful aggregate information we would like to be able to provide. This and similar information is very useful in aggregate and helps drive publisher decisions. Is this possible?

I was also wondering what the mechanism for creating these aggregate calls will look like on the publisher trusted worklet?

michaelkleber commented 3 years ago

There should be no problem slicing by whatever information the publisher already has.

Obviously the mechanism is TBD, since we've only just introduced the idea of a publisher trusted worklet. But the Multi-Browser Aggregation Service Explainer, and the further work describing Private Histograms, should offer a sense of what it will be build on.