WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
521 stars 222 forks source link

TurtleDove for Search Ads #39

Closed shaoyu-ms closed 1 year ago

shaoyu-ms commented 4 years ago

Search ads have a very heavy auction stack. Advertisers can bid on tens of thousands of keywords. Each auction consists of several stages, each of which runs their own complex algorithms. These stages include selection, relevance, click prediction, ranking, allocation and placement.

The models used in each algorithm consider signals coming from users’ entire sessions. Given this, there is often a large amount of data that contributes to the ads a user sees in the search engine results page. Based on our current evaluation of Turtledove, we do not see a way that a client-side auction will be able to scale to effectively meet the needs of search ads.

In the search context, the user issues a query and the Search engine finds keywords closest to that query and ads related to the query, based on the keyword match algorithms advertisers have chosen. The auction uses the user’s current and previous query and click history in deciding the relevance and click probability of the ads. In addition, the auction includes remarketing list membership to either include/exclude sets of users or to modify the bid on the relevant keywords. Turtledove does not address search scenarios and how remarketing would work in that context. Scaling a complex auction with multiple inputs including remarketing membership to the client JS will probably not work well.

Are there plans for a new API to address this scaling limitation? If not, how is Google planning to adopt Turtledove for search-based scenarios?

Thanks.

appascoe commented 4 years ago

Are you talking about search ads that appear on the search engine's own pages? If so, everything you list is still accomplishable with first-party cookies.

michaelkleber commented 4 years ago

@shaoyu-ms I agree, what I described in TURTLEDOVE was heavily focused on the display ads space, not on search ads.

You say that remarketing list membership are used "to either include/exclude sets of users or to modify the bid on the relevant keywords." So I guess our task is to figure out whether there's any way to accomplish some of those goals in a scenario where the server isn't allowed to know the user's interest groups, but instead the contents returned to the browser can take interest group membership into account in deciding how it renders.

The simplest approach I can think of is something like:

  1. Server-side auction decides that it wants to run N ads, then picks and ranks the top 2*N ads and sends them to the browser.

  2. Each of those 2*N ads come with a "default priority", and also with a "priority if this person is in some interest group owned by this advertiser." (Or maybe that's too simplified, and it would be more useful for each ad to come with a map from interest groups to priorities.)

  3. Your in-browser JS gets to figure out the actual priorities based on the TURTLEDOVE-style interest group memberships, and re-orders the 2*N ads appropriately. The top N of them get shown.

To preserve the privacy requirement that the server can't learn interest group memberships, the reordered ads would have to render inside the same sort of private environment used in the display-ads case, and you would only get aggregate reporting on which ads rendered and in which positions.

Do you think an approach of this sort would be of any use to you? Of course it would take substantial effort to deal with this new model. I guess the basic question is whether the benefit of letting interest group (remarketing list) membership affect ads is worth the cost.

shaoyu-ms commented 4 years ago

Thanks, @appascoe and @michaelkleber, and the proposed remedy for applying TurtleDove to Search Ads. I can see several issues in the suggested approach:

(1) the 2*N ads returned from server are the winners among hundreds of thousands ads, going through selection model, relevance model, click prediction model, while the retargeting ads sitting at the client side are directly prioritizing with these winners. This gives an advantage to retargeting ads and affects the relevance quality of the search ads results.

(2) In search ads, Retargeting ads are also bidding on potentially thousands of keywords. When a user search query comes in, the query is expanded to a set of keywords by algorithm (in most cases NLP model), if the keywords match what Retargeting ads is bidding on, the retargeting ads is then selected. If the retargeting ads are on the browser side, these thousands of bidding keywords need to be with the retargeting ads. The user query or expanded keyword set needs to be sent along with the contextual ad. The NLP model or keyword set matching needs to happen in the JS script. All this would need to be considerably simplified for client side JS resulting in lower relevance and click through rates.

(3) Search Ads also do whole page optimization when selecting the final N winning ads to optimize for the space as Search results page has strict ad space usage rules. With the opaqueness of retargeting ads and uncertainty of the final N winners before the client side auction, it will be a big setback to whole page optimization.

BTW, Search Ads uses 2nd highest pricing which means the server needs to bill the advertiser not using the advertiser’s bidding price, but the highest bid from other advertisers that are lower than this advertiser’s bid. The 2nd highest pricing scheme is also adopted in many Display/Native ads platforms. Does TurtleDove propose to send the winning ads’ bidding price to Chrome Aggregated API service for further processing? If so, it also needs to send the highest losing ad’s bid to the service as it will also impact the final pricing.

michaelkleber commented 4 years ago

Ah, sorry I was unclear! The idea I was trying to describe above was not a way to get current TURTLEDOVE (with pre-fetched ads) working in a search-ads context. Your objections (1) and (2) are good examples of why that seems hard to me.

Instead, I was thinking about a different problem: What "small amounts of personal information" could the browser keep track of, so that you could do most of your work server-side in a way similar to what you do today — and then in the end only do a small amount of filtering or re-ordering in the browser, based on that information unavailable to the server.

Perhaps the browser contributing only a few bits of information to the outcome (the booleans for membership in some specific remarketing lists) helps with the scale and optimization problems you're describing. Or perhaps it's so little information that it just isn't useful at all!

In either case, I'm happy to explore whether there is a way to balance these against each other to find something useful to you. Even if pre-fetched ads doesn't make sense for a search use case, on-device-only interest group membership might be of use to you in other ways.

JensenPaul commented 1 year ago

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Topics proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.