brave / adblock-lists

Maintains adblock lists that Brave uses
Mozilla Public License 2.0
340 stars 74 forks source link

Add rules for bouncers with encoded JSON blobs #880

Open ShivanKaul opened 2 years ago

ShivanKaul commented 2 years ago

Base64 + JSON: https://www.emjcd.com/links-i/?d=eyJzdXJmZXIiOiIxMDA3MDQyOTIzNzA2MDQ1MTg6SU91QlNqanZBMHZjIiwibGFzdENsaWNrTmFtZSI6IkxDTEsiLCJsYXN0Q2xpY2tWYWx1ZSI6ImNqbyF3dnhqLWEwZThkazctd3ppZy1kc25oanZhLWdoaWktbG5jbWJuei13NXRlLWRzNzNwbmUtdzc3Mi12MDRjcTNkIiwiZGVzdGluYXRpb25VcmwiOiJodHRwczovL3d3dy5jYXJoYXJ0dC5jb20vcHJvZHVjdC84MDE5NjYvZm9yY2UtcHJvLTM1bC1iYWNrcGFjayIsInNpZCI6Ii0tLSIsInR5cGUiOiJkbGciLCJwaWQiOjEwMDIyMzY2NCwiZXZlbnRJZCI6IjUwNjYzNTRiNGU0OTExZWM4MzY4MDNkZTBhMWMwZTBiIiwiY2pTZXNzaW9uIjoiYjMyZDQzYjItNjVkMi00NTZiLThjMmItZTY2NmExN2I4OTY0IiwibG95YWx0eUV4cGlyYXRpb24iOjAsInJlZGlyZWN0ZWRUb0xpdmVyYW1wIjpmYWxzZSwiY2pDb25zZW50RW51bSI6Ik5FVkVSX0FTS0VEIn0%3D

What needs to be done:

  1. Extract the d parameter from the query string.
  2. Decode as base64.
  3. Extract the value of destinationURL in the resulting JSON blob.

Here's what the JSON blob looks like:

{"surfer":"100704292370604518:IOuBSjjvA0vc","lastClickName":"LCLK","lastClickValue":"cjo!wvxj-a0e8dk7-wzig-dsnhjva-ghii-lncmbnz-w5te-ds73pne-w772-v04cq3d","destinationUrl":"https://www.carhartt.com/product/801966/force-pro-35l-backpack","sid":"---","type":"dlg","pid":100223664,"eventId":"5066354b4e4911ec836803de0a1c0e0b","cjSession":"b32d43b2-65d2-456b-8c2b-e666a17b8964","loyaltyExpiration":0,"redirectedToLiveramp":false,"cjConsentEnum":"NEVER_ASKED"}
fmarier commented 2 years ago

Another example: https://t.dripemail2.com/c/eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJkZXRvdXIiLCJpc3MiOiJtb25vbGl0aCIsInN1YiI6ImRldG91cl9saW5rIiwiaWF0IjoxNjYxNDc0MTk2LCJuYmYiOjE2NjE0NzQxOTYsImFjY291bnRfaWQiOiI1NDM5NzU5IiwiZGVsaXZlcnlfaWQiOiIxNnFnNmswMW1jdmxzZm9wOHRuZSIsInVybCI6Imh0dHBzOi8vc2tpcHBlcm90dG8uY29tL3VwZGF0ZXMtb24tb3VyLTUtcGFjaWZpYy1zYWxtb24tc3BlY2llcy8_X19zPXpjcmFhdHNqYnRlc3NzbjJraXlrIn0.g-jZA6mvGwx-soRGq7H8YSIUXOGyb9i5IV5_78aJRkA

In this case, we need to:

  1. Extract the last component of the path.
  2. Split the string on . and take the 2nd component.
  3. Base64-decode that substring.
  4. Extract the value of urlin the resulting JSON blob.

Here's what the JSON blob looks like:

{"aud":"detour","iss":"monolith","sub":"detour_link","iat":1661474196,"nbf":1661474196,"account_id":"5439759","delivery_id":"16qg6k01mcvlsfop8tne","url":"https://skipperotto.com/updates-on-our-5-pacific-salmon-species/?__s=zcraatsjbtesssn2kiyk"}

The __s will get removed automatically by the query filter, once we fix https://github.com/brave/brave-browser/issues/22967.

pes10k commented 2 years ago

Just a though, but initially when i spec'ed the debounce feature, i described a pipeline of different actions that could be applied to a URL to extract the destination URL. Things like "apply regex", "base64 decode the buffer", "extract JSON key", "extra query param key" and "extract path segment".

If we're going to start targeting these more sophisticated cases (which i think is a great idea) maybe it'd be good to revisit that original idea, so that we can have a smaller number of composable actions, instead of a large number of single-and-few-use actions?

pes10k commented 2 years ago

This was the original proposal in case its of interest:



// This is just a mapping of function names, to functions with the
// `URLSegmentMapper` signature.
const URLSegmentMapperFuncs = {
  atob: (x: string) => atob(x),
  copy: (x: string, prefix = '', suffix = '') => `${prefix}${x}${suffix}`,
  remove: () => "",
  decodeURI: decodeURI,
  decodeURIComponent: decodeURIComponent,
}

enum RewritingStepType {
  // Means that the extracted URL value should be transformed,
  // and the transformed string should be inserted back in the URL
  // in place of the initially targeted value.
  Map = "map",
  // Means that the extracted URL value should be used _in place of_ the
  // original URL (so that a subset of the current URL would be transformed,
  // and then become the complete new current URL).
  Replace = "replace",
}

type TargetPosition = {
  start: number,
  end: number,
}

type TargetResult = {
  wasSuccess: boolean,
  value?: string,
  position?: TargetPosition,
}

// Targets describe parts of URLs (or other vales) that should be processed in
// some way.
abstract class Target {
  abstract readonly type: string;
  abstract apply(url: URL): TargetResult;
}

class TargetJSONKey extends Target {
  readonly type: string = "TargetJSONKey";

  // The key of the JSON-encoded value to extract.
  readonly key: string;
}

class TargetQueryParam extends Target {
  readonly type: string = "TargetQueryParam";

  // The query paramter name to extract / target in this step.
  readonly key: string;
  // If provided, and the query param `key` in the target URL is an array,
  // then this number describes which item in that array to choose (zero
  // indexed).
  readonly index: number = 0;
}

class TargetQueryKeyAndParam extends TargetQueryParam {
  readonly type: string = "TargetQueryKeyAndParam";

  // This class does the exact same thing as the parent class, except it
  // intends to capture the query key as well as the value.
  // Given the URL https://example.org?some=value,
  // TargetQueryParam(key="some") would target "value",
  // while TargetQueryKeyAndParam(key="some") would target "some=value".
}

class TargetPath extends Target {
  readonly type: string = "TargetPath";

  // The index of the path segment to choose (e.g., given "/my/sample/path",
  // 0 would return "my", etc).
  readonly index: number;
}

type RewriteStep = {
  // How to identify which part of the URL to extract and map with this step of
  // the pipeline. If omitted, use the entire current URL / buffer.
  target?: URLTarget,

  // How to transform the targeted / identified part of the URL, into a new
  // string.
  func: URLSegmentMapper,

  // What to do with the returned, mapped URL substring, to replace the
  // new version in place (e.g., when changing "target",
  // https://example.org?target=old might become
  // https://example.org?target=new), or use the new, mapped to value
  // instead of the previous one.
  type: RewritingStepType,

  // Boolean describing whether, if anything goes wrong in the targeting
  // step (i.e., determining which part of the URL to target), or the
  // rewriting step (i.e., figuring out how to modify and/or use the targeted
  // URL bit), whether (`true`) to keep going, and pretend like this   step
  // didn't exist, or (`false`) to stop processing further, and return error.
  continueOnError: false,
}

// Rule definition
type URLRewritingRecipe = {
  // One or more strings, encoding
  // [URLPatterns](https://source.chromium.org/chromium/chromium/src/+/main:extensions/common/url_pattern.h;l=49;bpv=1;bpt=1?q=URLPattern&ss=chromium)
  // Note, that if we need more flexibility, these could be replaced with
  // [adblock-rs](https://www.npmjs.com/package/adblock-rs) format rules.
  // These describe which URLs should be considered by the additional steps
  // for this rule.
  urlPatterns: string[],
  steps: RewriteStep[]
}

// Example 1: https://bad.com?uid=123&destination=https%3A%2F%2Fgood.com
const example1 = {
  urlPatterns: [
    "https://bad.com/*"
  ],
  steps: [
    {
      target: {
        type: "URLTargetQueryParam",
        key: "destination",
        // index = 0 is assumed, and so will be omitted from future examples.
        index: 0,
      },
      func: "decodeURIComponent",
      type: "replace",
      // False here is assumed, and so will be omitted from the rest of the
      // examples.
      continueOnError: false,
    }
  ]
};

// Example 2: Strip the Facebook click id (fbclid) from all navigation URLs.
const example2 = {
  urlPatterns: [
    "?fbclid=",
    "&fbclid=",
  ],
  steps: [
    {
      target: {
        type: "URLTargetQueryKeyAndParam",
        key: "fbclid"
      },
      func: "remove",
      type: "map",
    }
  ]
};

// Example 3: Some really mean jerks do something horrible like encode
// redirection instructions, in JSON, base64'ed, in a path parameter.
// Something like:
//
// # First put the instructions in JSON.
// const step1 = JSON.stringify({dest: "https://good.com"});
//
// # Then encode that as base64.
// const step2 = window.btoa(step1);
//
// # Then put that in the path of the bounce trackers URL.
// const step3 = `https://tracker.com/bounce/${step2}/go`;
//
// Giving the following URL
// https://tracker.com/bounce/eyJkZXN0IjoiaHR0cHM6Ly9nb29kLmNvbSJ9/go
const example3 = {
  urlPatterns: [
    "https://tracker.com/bounce/*",
  ],
  steps: [
    {
      target: {
        type: "URLTargetPath",
        index: 1,
      },
      func: "copy",
      type: "replace",
    },
    {
      func: "atob",
      type: "replace",
    },
    {
      target: {
        type: "TargetJSONKey",
        key: "dest",
      },
      func: "copy",
      type: "replace",
    }
  ]
};