Racro / measurements_user-concerns

0 stars 1 forks source link

About "Effective ad-blockers remove the frame objects from the page layout as part of the filtering process." #1

Open gorhill opened 3 weeks ago

gorhill commented 3 weeks ago

Regarding the paper "From User Insights to Actionable Metrics: A User-Focused Evaluation of Privacy-Preserving Browser Extensions", there is this paragraph:

Ads. To evaluate the effectiveness of ad-blocking, we measure the reduction in the number of frames on web pages. Advertisements are often displayed within HTML frames, which display content independent of its container. Many HTML tags that contain ad scripts are rendered as frames and iframes. Effective ad-blockers remove the frame objects from the page layout as part of the filtering process. We use Puppeteer to hook into the web page and calculate the number of frames using the page.metrics() function.

More specifically, this passage:

Effective ad-blockers remove the frame objects from the page layout as part of the filtering process

I looked into the code, and I believe this is where the counting occurs: https://github.com/Racro/measurements/blob/master/effective/ads/frames.js#L90-L93

Counting the number of frame objects to assess the effectiveness of a content blocker at blocking ads leads to flawed assessment as far as uBO is concerned -- the paper assessed uBO as "Subpar" regarding the "Ads" category). Two reasons:

uBO does not remove frame objects which URL has been blocked, so as to minimize page breakage. The frame objects which were blocked at network level are visually collapsed.

Furthermore, uBO will create dummy frame objects so as to minimize page breakage and content blocker detection, see https://github.com/gorhill/uBlock/blob/1.59.0/src/web_accessible_resources/googlesyndication_adsbygoogle.js#L35-L41

Racro commented 3 weeks ago

Hi @gorhill! Thanks for taking the time to read the paper and review the code. I transferred the issue here as this is the official repo linked to the paper.

Coming to your point, I understand the two points that you mentioned which might have increased the number of frames for UbO. Since we treated extensions as a black box for performance and effectiveness measurements, we did not look into their source code and potential causes. Hence the tag 'Subpar'. I also understand that as the developer of the extension, you might find the 'Subpar' assessment misleading to the readers. However, the source code was made open so that people can review our methodologies and improve upon them over time.

We chose frames as an optimal proxy for measuring the decline in ads as they generally appear as part of frames and is generalizable. What do you think can be a common methodology for doing this? Some options:

  1. Instead of just calculating the decrease in the number of frames, we also calculate the frames which have been collapsed i.e. their size has been reduced to 0 as no frames. This would solve the 1st issue but not the 2nd.
  2. To solve the 2nd issue, we can try and understand the origins of frames i.e. what scripts generated them. This might be a non-trivial problem and would require tools like pagegraph, etc
  3. Calculating the drop in the network requests and attributing it to ads based on easylists. This is one solution but it might count those ads resources that are blocked during the rendering phase and not on the network level.

Happy to talk and discuss it further and improve the 'Ads' measurement.