citp / news-disinformation-study

A research project on how web users consume, are exposed to, and share news online.
8 stars 2 forks source link

Syncing Background Page and Content Script Measurements #67

Closed jonathanmayer closed 3 years ago

jonathanmayer commented 4 years ago

Background

There are two primary execution environments for extensions built with WebExtensions: the background page and content scripts. Some of our measurements (page navigation/attention and social media sharing) are implemented in the background page, because that's where key WebExtensions APIs are available (tabs, windows, idle, webRequest, and webNavigation). Other measurements (text classification, social media account exposure, social media news exposure, and scroll depth) are implemented in content scripts, since the measurements require access to webpage DOMs and associated events.

Because some of the analysis we'd like to conduct involves combining measurements from the two execution environments (e.g., how long did the user pay attention to pages with a certain classification?), we need a way of syncing the two environments. Unfortunately, in the current WebExtensions API, the background page can only identify content scripts by their tab ID (and additional matching heuristics like URL, referrer, and time). There is no unique ID that links a webpage (as seen by the background page) to a webpage (as seen by a content script). As a result, there is a risk of race conditions between the background page and content scripts.

We've been focused on building measurement and utility modules since the fall, so we haven't had much need to evaluate and resolve these possible issues. But now that we're implementing Telemetry pings that integrate various measurements, we can't avoid the issues any longer.

Possible Race Conditions

There appear to be four possible race conditions associated with the two execution environments.

  1. Background Page → Content Script, Content Script Is Ahead. The background page posts a message to a content script on Page A. The message reaches a content script on Page B, however, because the tab navigated from Page A to Page B and the background page didn't catch up before posting the message. I've added a POC for this race condition in the race-condition-test branch.

  2. Background Page → Content Script, Background Page Is Ahead. The background page posts a message to a content script on Page B. The message reaches a content script on Page A, because the tab has not completed navigation from Page A to Page B. This scenario doesn't really concern me, since I assume by the time the background page learns about a committed navigation (at least via the tabs or webNavigation APIs) messages from the background page to content scripts are already going to the new page. I haven't been able to induce this race condition in a POC.

  3. Content Script → Background Page, Content Script Is Ahead. A content script on Page B posts a message to the background page. The background page believes the message is from a content script on Page A, because the tab previously navigated from Page A to Page B and the background page hasn't caught up. I also haven't been able to induce this race condition; perhaps the background page event queue guarantees an event ordering (i.e., tabs.onUpdated and webNavigation.onCommitted always fire for Page B before runtime.onMessage fires for Page B) that makes this race condition impossible?

  4. Content Script → Background Page, Background Page Is Ahead. A content script on Page A posts a message to the background page. The background page believes the message is from a content script on Page B, because the tab navigated to Page B before the message reached the background page. I also haven't been able to induce this race condition; perhaps the background page event queue guarantees an event ordering (i.e., runtime.onMessage always fires for Page A before tabs.onUpdated and webNavigation.onCommitted fire for Page B) that makes this race condition impossible?

Possible Solutions

A few directions come to mind.

  1. No Changes. If the third and fourth types of race conditions aren't possible, I think we're OK—we can generate a unique page ID in the background page environment (on tabs.onUpdated or webNavigation.onCommitted) and confidently associate subsequent content script messages with those IDs (until the next tabs.onUpdated or webNavigation.onCommitted for that tab).

Even if those race conditions are possible, we might be able to get away without changes for the initial COVID-19 study. We aren't matching up many measurements, and where we are matching measurements, using heuristics is probably reasonable.

  1. Measure Page Navigation/Attention with Content Scripts. We relocate our page navigation/attention measurements to the content script environment. Specifically:
    • Page Visit Start: instead of using tabs.onUpdated (or webNavigation.onCommitted) in the background page, inject a content script at document_start.
    • Page Visit Stop: instead of using tabs.onUpdated (or webNavigation.onCommitted) and tabs.onRemoved in the background page, use the window.unload event in a content script.
    • Page Attention Start/Stop: track like we currently do and post events to the relevant content scripts.

We could also generate a unique ID for each page in the content script environment. This approach would solve race condition issues for the page navigation measurements and mostly solve them for the attention measurements. The approach also wouldn't require modifying the WebExtensions API and would have a well defined page lifecycle (like our current approach). It would have some drawbacks, though.

  1. Modify the WebExtensions APIs. Changes to the WebExtensions APIs could address these race condition risks, in whole or in part. For example, if various WebExtensions APIs exposed a unique top-level page ID (e.g., innerWindowID), that would make syncing between the two execution environments much easier. I imagine that's unrealistic, though, and especially on any near-term timeline.

Next Steps

Initially I thought we could stick with the current implementation, then I moved toward the content script model, and now I'm back to sticking with the current implementation (especially if the third and fourth race conditions can't happen). Suggestions very much appreciated. This is pretty in-the-weeds detail about event loops and message passing between execution environments.

jonathanmayer commented 3 years ago

Closing this out. After lots of discussion and testing, we settled on the PageManager approach.