essandess / adblock2privoxy

Convert adblock config files to privoxy format
https://hackage.haskell.org/package/adblock2privoxy
GNU General Public License v3.0
93 stars 16 forks source link

easylist go.*. rule breaks many sites #19

Closed wmyrda closed 6 years ago

wmyrda commented 6 years ago

Taking care of double rules is not enough as there even single rules which by using .*. break more sites than intended

#ab2p-block-request-R1304
{+client-header-tagger{ab2p-block-request-R1304} \
}
# |http://go.$domain=nowvideo.sx (easylist.txt: 46984)
go.*.

Following is setting header for sites as imasdk.googleapis.com

WORKAROUND: Use sed -i -e '/^go\.\*\./s/^/#/' /etc/privoxy/ab2p.action to disable this rule

P.S. Rulesets I created after all fixes/workarounds so far still use .*. ~1200 times. Almost all other actually seem less harmless with exception of promo.*. wich does come from easylist.txt as well.

wmyrda commented 6 years ago

Trying to fix this issue I did some testing for it and this is what I found out:

||log. - original adblock record ^log.*. - converted with fix from https://github.com/essandess/adblock2privoxy/issues/23

This is still not right. After fix it would not catch frazes with blog.mypage.com, but still would catch stuff like loggingintothepage.mypage.com.

The only proper combination I found was ^log\.(*PRUNE).*? as this would catch log.mypage.com, but not loggingintothepage.mypage.com.

Proposed solution is to change all instances of . into \. even in hostnames not only in patterns like it is now and change = lst : "*." into = lst : "(*PRUNE).*?"

While changing the latter was easy in the adblock2privoxy code not knowing haskell I am not sure how to changed it within the code for dots and was able to do so partially only with sed -i -e '/\./{/^\^/s/\./\\./}' afterwords - change instances of dot into \dot but only for lines starting with ^. My attempts to fix this in the code failed so far and help fixing it is welcomed.

wmyrda commented 6 years ago

After a bit of trial and error I come with this. Not only it compiles but also seems to work just like expected :) It is combined with previous patch for https://github.com/essandess/adblock2privoxy/issues/23

diff -Naur adblock2privoxy-9999.old/adblock2privoxy/src/PatternConverter.hs adblock2privoxy-9999/adblock2privoxy/src/PatternConverter.hs
--- adblock2privoxy-9999.old/adblock2privoxy/src/PatternConverter.hs    2018-07-23 14:45:40.829753697 +0200
+++ adblock2privoxy-9999/adblock2privoxy/src/PatternConverter.hs        2018-07-23 14:47:28.325970392 +0200
@@ -34,20 +34,22 @@
             | otherwise = "/"
         host' = case host of
                     "" -> ""
-                    _  -> changeFirst.changeLast $ host
+                    _  -> changeFirst.changeMiddle.changeLast $ host
                     where
                     changeLast []     = []
                     changeLast [lst]
                         | lst == '|' || lst `elem` hostSeparators   =  []
-                        | lst == '*' || lst == '\0'                 =  "*."
-                        | otherwise                                 =  lst : "*."
+                        | lst == '*' || lst == '\0'                 =  "(*PRUNE).*?"
+                        | otherwise                                 =  lst : "(*PRUNE).*?"
                     changeLast (c:cs) = c : changeLast cs

+                    changeMiddle = replace "." "\\."
+
                     changeFirst []    = []
                     changeFirst (first:cs)
                         | first == '*'                       =       '.' :  '*'  : cs
                         | bindStart == Hard || proto /= ""   =             first : cs
-                        | bindStart == Soft                  =       '.' : first : cs
+                        | bindStart == Soft                  =       '^' : first : cs
                         | otherwise                          = '.' : '*' : first : cs

         query' = case query of
essandess commented 6 years ago

@wmyrda I’m honestly still swamped with other projects, but am starting to think about thinking about addressing all the great issues you’ve raised. Rather than work through these linearly, would you please triage what you believe to be the most important issues?

Also, you raised compiler issues in another thread. That one perhaps is the most fundamental because the code refactoring should be done in such a way that it isn’t undone by a version upgrade.

It looks like this may be one the highest priority issues to address. Would you Please weigh in? Note that in markdown you can refer to stuff easily with e.g. #19 #19 links.

wmyrda commented 6 years ago

Please do not feel like I am pushing You to do stuff, so definitely you may address them whenever You desire. To make it easier follow what is important I will create another issue which would summarize all open bugs along with my subjective importance (low/medium/severe) and scope of required work (trivial/normal/high).

For compiler issue I think help is coming.

essandess commented 6 years ago

Fixed. See comments in #10.