brave-experiments / ad-block

Ad block engine used in the Brave browser for ABP filter syntax based lists like EasyList.
https://www.brave.com
Mozilla Public License 2.0
239 stars 95 forks source link

Use etld plus one matching for 3p #172

Open pes10k opened 5 years ago

pes10k commented 5 years ago

fixes #171

diracdeltas commented 5 years ago

general question about this approach:

is it possible to just use the etld+1 parsing from Chromium? see https://cs.chromium.org/chromium/src/net/base/registry_controlled_domains/registry_controlled_domain.h?q=getdomainandregistry&dr=CSs

pes10k commented 5 years ago

general question about this approach:

is it possible to just use the etld+1 parsing from Chromium? see https://cs.chromium.org/chromium/src/net/base/registry_controlled_domains/registry_controlled_domain.h?q=getdomainandregistry&dr=CSs

We could, but then we'd loose the ability to run in node (which has been very valuable for crawling / measurement, getting other folks to use the code, debugging, etc)

fmarier commented 5 years ago

I've got a question about the build process since I don't actually know when the build step takes place: if we pull down the list at build time, is there a reason to have it checked into the repo?

pes10k commented 5 years ago

I've got a question about the build process since I don't actually know when the build step takes place: if we pull down the list at build time, is there a reason to have it checked into the repo?

You're right no need for this. I removed it from the .gitignore previously, now also removed it from the set of tracked files. Should be good now

pes10k commented 5 years ago

@bbondy my code expects the public suffix list to be in a known location, and lazily parses the list on first use (i.e. at etld/data/<list>.dat). I have no idea if this will work when rolled into the larger browser. I'm just not familiar enough with the build process. Can you double check that aspect?

pes10k commented 5 years ago

@bbondy this is now ready for review again. The ways to enable the eTLD+1 checking (by parsing a public suffix list) are:

1) when using the check.js script, use the new -P, --public-suffix-rules-path option, and point it to a text including public suffix rules. 2) use the js AdBlockClient.parsePublicSuffixRules method and give it a string containing public suffix rules 3) use the C++ AdBlockClient::parsePublicSuffixRules method with a char* / std::string of rules 4) use either the C++ or JS deserialize methods with a dat file that includes public suffix rules data (serializing after doing 1, 2 or 3 will include the public suffix rule data in the .dat).