TechnikEmpire / DistillNET

DistillNET is a library for matching and filtering HTTP requests and HTML response content using the Adblock Plus Filter format.
Mozilla Public License 2.0
16 stars 4 forks source link

Support for Adblock rules #24

Closed sudongg closed 2 years ago

sudongg commented 2 years ago

Hi, I tried this library. Although it can read AdBlock rules, it can’t handle some AdBlock rules correctly. Below is the code:

            var parser = new AbpFormatRuleParser();
            UrlFilter urlFilter = (UrlFilter)parser.ParseAbpFormattedRule("||g.abc*.com^", 1);

            NameValueCollection headers = new NameValueCollection(StringComparer.InvariantCultureIgnoreCase)
            {
                ["Accept"] = "text/html, application/xhtml+xml, application/xml; q=0.9, */*; q=0.8",
                ["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; WOW64; WebView/3.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
                ["Upgrade-Insecure-Requests"] = "1",
                ["Accept-Language"] = "it-IT, it; q=0.8, en-US; q=0.5, en; q=0.3"
            };

            if (urlFilter.IsMatch(new Uri("http://g.abc0358.com"), headers))
            {
                MessageHelpers.Show("IsMatch");
            }

Under normal circumstances, one should receive an IsMatch message. Am I doing something wrong?

TechnikEmpire commented 2 years ago

Yeah, you have a separator indicator at the end of your rule ^ and there is no separator on the URI you're trying to match. Drop the separator from the rule.

sudongg commented 2 years ago

@TechnikEmpire I tried to remove ^, it was of no use, he return false in the IsMatch function.

      if (ApplicableDomains.Count > 0 && !ApplicableDomains.Contains(hostWithoutWww))
                {
                    return false;
                }

I tried to remove ||, he worked normally, and I received IsMatch message.

UrlFilter urlFilter = (UrlFilter)parser.ParseAbpFormattedRule("g.abc*.com^", 1);

I found that when || is included, the Parts property of UrlFilter is AnchoredDomainFragment. After deleting ||, the Parts property of UrlFilter is StringFragment, WildcardFragment, StringLiteralFragment, SeparatorFragment. Returns true when separatorfragment;

In the rule list, for example, similar rules in easylist all start with ||, how should I set it to load the Filter correctly.