Open pes10k opened 5 years ago
general question about this approach:
is it possible to just use the etld+1 parsing from Chromium? see https://cs.chromium.org/chromium/src/net/base/registry_controlled_domains/registry_controlled_domain.h?q=getdomainandregistry&dr=CSs
general question about this approach:
is it possible to just use the etld+1 parsing from Chromium? see https://cs.chromium.org/chromium/src/net/base/registry_controlled_domains/registry_controlled_domain.h?q=getdomainandregistry&dr=CSs
We could, but then we'd loose the ability to run in node (which has been very valuable for crawling / measurement, getting other folks to use the code, debugging, etc)
I've got a question about the build process since I don't actually know when the build step takes place: if we pull down the list at build time, is there a reason to have it checked into the repo?
I've got a question about the build process since I don't actually know when the build step takes place: if we pull down the list at build time, is there a reason to have it checked into the repo?
You're right no need for this. I removed it from the .gitignore previously, now also removed it from the set of tracked files. Should be good now
@bbondy my code expects the public suffix list to be in a known location, and lazily parses the list on first use (i.e. at etld/data/<list>.dat
). I have no idea if this will work when rolled into the larger browser. I'm just not familiar enough with the build process. Can you double check that aspect?
@bbondy this is now ready for review again. The ways to enable the eTLD+1 checking (by parsing a public suffix list) are:
1) when using the check.js
script, use the new -P, --public-suffix-rules-path
option, and point it to a text including public suffix rules.
2) use the js AdBlockClient.parsePublicSuffixRules
method and give it a string containing public suffix rules
3) use the C++ AdBlockClient::parsePublicSuffixRules
method with a char*
/ std::string
of rules
4) use either the C++ or JS deserialize
methods with a dat
file that includes public suffix rules data (serializing after doing 1, 2 or 3 will include the public suffix rule data in the .dat
).
fixes #171