citp / news-disinformation-study

A research project on how web users consume, are exposed to, and share news online.
8 stars 2 forks source link

Handle <a> tags added to the DOM during LinkExposure studies #21

Closed jonathanmayer closed 4 years ago

jonathanmayer commented 4 years ago

This is especially important for social media, where infinite scrolling is common.

A few design options:

  1. When the page loads, and periodically afterward, use querySelectorAll to get a static NodeList of tags. Use expando attributes to keep track of whether a tag has been added to or removed from the DOM. This is probably the place to start. If needed, we might be able to get a performance boost from embedding domains into the CSS selector.
  2. Same as the previous option, but using getElement and a live HTMLCollection. This might be a bit faster, though using a live set of nodes could have risks.
  3. Listen for DOM mutation events. There isn't a way to filter by tag type, so we'd have to go through every single changed node. That could be slow if there are frequent DOM mutation events, and this is probably the most difficult option to implement.
jonathanmayer commented 4 years ago

Followed up on @PranayAnchuri 's difficulty with link exposure measurement on Twitter. Looks like the Twitter redesign includes a React architecture where all timeline resources are loaded dynamically. As a result, when the document_idle event fires, there aren't links loaded into the DOM. We're going to use a delay in link exposure measurement as a short-term stopgap fix; the long-term solution is addressing this issue.

PranayAnchuri commented 4 years ago

Done. tags that have not been visited previously are fetched every 'n' seconds (2).

biancadanforth commented 4 years ago

You could also consider using the Intersection Observer API in the future. It's designed for cases like this.