brave / adblock-lists

Maintains adblock lists that Brave uses
Mozilla Public License 2.0
335 stars 74 forks source link

LAtimes.com - filter rules to fix new native ads #8

Closed lukemulks closed 7 years ago

lukemulks commented 7 years ago

3 different flavors of new, native ads discovered as of this morning on latimes.com - being tracked and investigated here: https://github.com/brave/browser-laptop/issues/7454

These are different and new, unrelated to recent PR that was merged last week.

2 different versions of "sponsored content" ads discovered this morning on Windows, which the rules below have confirmed resolving when tested in about:adblock.

||aggrego.org^$script,image, domain=latimes.com
||adserve.postrelease.com^$script,image, domain=latimes.com
||troncdata.com^$script,image, domain=latimes.com
||polarmobile.com^$script,image, domain=latimes.com
||ntv.io^$script,image, domain=latimes.com
||gigya.com^$script,image, domain=latimes.com

A 3rd issue has been reported from MacOS, but has not yet been reproduced in Windows. Still investigating that.

Will submit a PR for adding the rules above to solve for 2/3, and will follow up here if the third issue is also resolved once I'm able to verify from MacOS.

lukemulks commented 7 years ago

PR submitted: https://github.com/brave/adblock-lists/pull/9

lukemulks commented 7 years ago

Confirming that after testing on iOS, no longer seeing the ads display that the rules above resolve.

The third item I was unable to repro was confirmed as a shield-down observation, so considering the PR above the fix for the issues reported today.

lukemulks commented 7 years ago

@bbondy confirming that all the rules merged above are functioning as expected in 0.13.5, except for one which I made an error on.

I've corrected the error, and submitted this PR to resolve. https://github.com/brave/adblock-lists/pull/10

Apologies for the oversight. Tested the update with the corrected version in about:adblock, and it fully resolves the issue.

matt2000 commented 7 years ago

Hi, I'm the developer for troncdata.com (an internal service of LATimes / Tronc) and I want to let you know that it does not serve advertising at all, native or otherwise. Can you please remove it from your block list? Thanks.

lukemulks commented 7 years ago

We're going to have to agree to disagree in this instance.

I am going to strongly object to removing troncdata from the blocklist.

I've reversed the implementation, and troncdata appears to be used from the 1st party level with an ad widget, invoking an infusion script function with nested timing sequences for matching page data and user IDs to delayed ad calls that bind and replace to an html <aside> container in the right rail. data-state and other data-* attributes are used in a clever way in this implementation.

Gigya appears to be the user-match/attribution vendor involved in this one. Forbes and others use Gigya, which follows a similar pattern. It appears other vendors are involved as well (teads, etc.) at the advertiser level, but it appears that Gigya directly correlates with the native widget in the right rail (dispatch references for the widget in the script)

Request 1: http://recommend.troncdata.com/js/widget.js

Response:

trb.runInfuse.push(function() {
var tries = 0;

var related = $('.trb_ar_rail div[data-eg-type=related]');

var isShown = false;
function showRecommendations() {
  related.css('opacity', 1);
  isShown = true;
}

function waitForData() {
  if(trb.hasOwnProperty('data')
     && trb.data.hasOwnProperty('page')
     && trb.data.page.hasOwnProperty('slug')
    ){
    (function($) {

      if (!localStorage.getItem('recsys_override') && location.hash.includes('enable-troncrecs')) {
        localStorage.setItem('recsys_override', 99);
      }

      var userId = localStorage.getItem('recsys_override') || trb.cookie.get('uuid') || localStorage.getItem('recsys_id');
      if (!userId) {
        var userId = Math.random() + Date.now();
        localStorage.setItem('recsys_id', userId);
      }
      var params = [userId, trb.data.page.slug];
      $.ajax('https://recommend.troncdata.com/recommendations/'
              + params.join('/'), {
        'dataType': 'json',
        'crossDomain': true,
        'complete': showRecommendations,
        'success': function(data) { if (data['markup'] && !isShown) related.replaceWith(data['markup']);}
      });
      setTimeout(showRecommendations, 1500);
    })(i$.jQuery);
  } else if (tries < 10) {
    tries = tries + 1;
    setTimeout(waitForData,50);
  } else {
    showRecommendations()
  }
}
waitForData();
});

XHR Request: https://recommend.troncdata.com/recommendations/57669011-a2b8-42a1-968a-23695a7422ee/la-me-ln-winter-temperature-records-20170310

Response: {"recsys": "control"}

Gigya script: http://cdn.gigya.com/js/gigya.js

Walkthrough to illustrate the crawl (sans-blocking)

Tribune infusion runs... delayloader-data-latimes-03102017

troncdata passes values from the page, but also sets the slug data as the ad unit IDs for the ad widget toward the bottom of the screencap.

Continue the crawl, and discover the same troncdata widget script with gigya passed through as the mediaConductor, which also contains a site ID assigned for LA Times from the vendor (gigya) within the same <aside> html tag for the right rail (matching <aside id=> value. delayloader-data-latimes-2-03102017

At a minimal level_, this is working in conjunction with the ad product integration on the page (and likely, on other tronc sites). UUIDs are passed in the string (57669011-a2b8-42a1-968a-23695a7422ee) which data management platforms can match users to. With this blocked as-is, no webcompat issues have been reported in over a week, and the content that does load in the right rail slot doesn't appear to show any issues.

We want to block this imo, cc: @bbondy

bbondy commented 7 years ago

Thanks @lukemulks

matt2000 commented 7 years ago

@lukemulks Your analysis is incorrect, and I'd like to help clarify the purpose of this widget. I wrote widget.js and the backend service at recommend.troncdata.com that it communicates with. I'm happy to be 100% transparent about what's happening here. (As you can see, the script is not even minimized, let alone obfuscated.) I work directly for Tronc / LA Times, and have since July 2016. (I.e., I'm not a contractor or third-party service provider. I collaborate with the newsrooms on this project, not ad sales.)

Please let me know if there are any other questions I can answer to set your concerns at ease.