ampproject / amphtml

The AMP web component framework.
https://amp.dev
Apache License 2.0
14.89k stars 3.89k forks source link

Intent to implement: Ad slot DOM fingerprint #5557

Closed bobcassels closed 7 years ago

bobcassels commented 8 years ago

We propose to implement an ad slot “DOM fingerprint." This will distinguish different ad slots on a single page, based on where they appear in the DOM structure.

For some operations involving ads, including ad selection machine learning, it is sometimes useful to have a rough fingerprint of the DOM surrounding the place an ad is going to appear. This is distinct information from the pixel x/y location of the ad on the rendered page, because this signal generally stays the same across changes to device size or lengths of paragraphs of text.

The DOM fingerprint examines the parent chain of DOM elements. At each level, the id attribute of the element, if any, and the ordinal of elements of that type within the parent, are noted. For example, 'td.1,tr.0,table.0,div/id2.0,div/id1.0' would be the DOM structure information for this:

<div id='id1' ...>    // div/id1.0
  <div id='id2' ...>  // div/id2.0
    <table ...>       // table:0
      <tr>            // tr:0
        <td>...</td>  // td:0
        <td>          // td:1
          <amp-ad ...></amp-ad>
        </td>
      </tr>
      <tr>...</tr>    // tr:1
    </table>
  </div>
</div>

This DOM structure information is collected as a string. That string is then hashed to produce a 32-bit unsigned integer that is sent as the value of the DOM fingerprint. This integer will be added to the context object available to AMP 3p ads. The same integer will be made available to A4A ads.

jridgewell commented 8 years ago

/to @jasti, @lannka. Don't we already have an issue for this?

jasti commented 8 years ago

@jridgewell Not that I know of. @bobcassels discussed this on an email thread.

lannka commented 8 years ago

I remember we talked about in one of the meetings when you guys were visiting. I still had the question, why the ordinal of the ad slot does not work, if all you want is to uniquely identify a slot.

BTW, this would again require the CryptoService to be moved to core for the hashing functionality: #3888

bobcassels commented 8 years ago

@lannka -- for some uses, the ordinal is sufficient. But sometimes a publisher will have an overall template with several postions for ads, and put ads in only some of them. For example, there may be a banner ad, one on the left side of the page, and one on the right side of the page. Compare, for example, the home page of nameadog.com, an article page, and a photo page. We might like to consider the ad slot on the right side of the article page to be equivalent to the (only) slot on the home page. The DOM fingerprint is the same for those, and different for the ad slot on the left side of the article page. And the two slots on the photo page are different from any of the slots on the home page or article page. So we see that the DOM fingerprint matches an intuition about slot "sameness."

The hashing function can be simple. I propose to use something like djb2. We do not need CryptoService for this simple use.

tdrl commented 8 years ago

Doesn't AMP set / create element IDs in some cases? (It looks to me like AMP will set them if they're not already present, but that's empirical -- I don't know what the real behavior is.) Will this impact your algorithm? In particular, I don't know how stable AMP's algorithm for this assignment is -- if the ID changes for the same slot over multiple runs, that will mess up your hash, won't it?

jridgewell commented 8 years ago

Doesn't AMP set / create element IDs in some cases?

Only for AMP elements, and we may remove that. The are set according to how they are parsed in DOM, so they always generate the same ID.

BTW, this would again require the CryptoService to be moved to core for the hashing functionality

Why can't this be late loaded with ads?

lannka commented 8 years ago

@bobcassels thanks for your good example which does explain the problem!

@jridgewell sure it can. Right now amp-ad of type doubleclick and adsense relies on amp-analytics to load CryptoService and generate ad CID.

However, I do think we should remove such a dependency. It's easy to miss. Right now, we don't even tell the publishers that they need to have for their doubleclick ad to have ad CID. Neither validator checks that.

cramforce commented 8 years ago

The auto generated ids would be a problem if you want to find the same ad slot in "similar" pages.

dvoytenko commented 8 years ago

I think this all sounds good.

Couple of notes:

  1. As others noted, our autogenerated IDs would cause issues. FYI, we added auto-gen IDs because we hoped we could move media and sizes attribute implementations to a pure CSS. That didn't work. Here we have two options: either we can try to remove our auto-gen IDs, or you can simply ignore them when generating path.
  2. I'd like to know what's performance for an average-size document to calculate path hashes. Has anyone done this kind of analysis?
bobcassels commented 8 years ago

@dvoytenko -- my code contains limits in case docs are huge. On my Mac laptop / Chrome, the max is 1ms.

dvoytenko commented 8 years ago

Sounds great!