collinbarrett / FilterLists

:shield: The independent, comprehensive directory of filter and host lists for advertisements, trackers, malware, and annoyances.
https://filterlists.com
MIT License
1.35k stars 117 forks source link

My New Filterlist #3551

Open thedoggybrad opened 1 year ago

thedoggybrad commented 1 year ago

Raw: https://raw.githubusercontent.com/thedoggybrad/supersecurityfilterlist/main/list.txt

Github: https://github.com/thedoggybrad/supersecurityfilterlist

iam-py-test commented 1 year ago

I have a few questions (not affiliated with this project, just curious). Where do you get the entries for this list? The README mentions Phishing Domain Database and The Big List of Hacked Malware Web Sites Also, there are a lot of duplicated entries: image Thanks!

collinbarrett commented 1 year ago

@iam-py-test , what tool did you use to find those duplicates? Just curious.

gwarser commented 1 year ago

Visible in uBO

image

iam-py-test commented 1 year ago

iam-py-test , what tool did you use to find those duplicates? Just curious.

I used https://abpvn.com/ruleChecker/redundantRuleChecker.html (DandelionSprout recommends it in the adfilt README, that's how I found it), but @gwarser's method works too (though this shows the specific redundant rules). I am working on a PR to remove some of the redundant rules, but there are too many to do by hand and my Python script keeps wanting to change the line endings from CRLF to LF, which makes the diff show I changed every single line.

jarelllama commented 1 year ago

I recently had to deal with this issue on my own blocklist. Here is a snippet of code in Bash to find redundant entries:

while read -r entry; do
    grep "\.${entry#||}$" adblock.txt >> redundant_entries.txt
done < adblock.txt

# The output has a high chance of having duplicates
sort -u redundant_entries.txt -o redundant_entries.txt

This assume your list only has entries in the form of ||example.com^. The code loops through each entry and converts it into a pattern to be matched by grep. grep looks for other entries that are subdomains (of any level) of the current entry. The whole process takes quite long (takes about 45 seconds for my 2300 rule ABP list).

I'm going to feed the redundant entries file into my list building script so it ignores the entries in the file.

thedoggybrad commented 1 year ago

I will try to fix those duplicates. I have not checked for it. Let me fix it.

thedoggybrad commented 1 year ago

I have a few questions (not affiliated with this project, just curious). Where do you get the entries for this list? The README mentions Phishing Domain Database and The Big List of Hacked Malware Web Sites Also, there are a lot of duplicated entries: image Thanks!

What you have said is right. Just compiled them.

iam-py-test commented 1 year ago

Also, one small comment on the README. IMO "uBlock" is garbage and shouldn't be recommended as an option to use this list with; it was unmaintained for years and then recently removed it's code from GitHub and started pushing updates again. The developer(s) have done shady stuff in the past (tracking users, stealing code), and doesn't even have a functional options page, so it's not even possible to install any non-default lists in it: image It's also blocked as malicious by several blocklists, including uBo's default badware risks.

thedoggybrad commented 1 year ago

@iam-py-test Thanks for that, removing it ASAP on my readme of all my filterlists (Update: Sucessfully removed on the readmes of all my filterlists.)

By the way, the duplication of filters are fixed.

thedoggybrad commented 1 year ago

@iam-py-test Thanks for making me aware of what is happening on uBlock now. Before it was almost looking like the same as uBlock Origin. What I know is that uBlock is the original one but due to conflicts between 2 repository owners the original owner maked uBlock Origin. Before, I have read some recommendations on uBlock Origin's filterlist (issues on repository) itself suggesting not to use uBlock. Now, the Github code for uBlock has been removed, I was surprised to know that and immediately looked for it myself. I am not actually a fan of uBlock either.

By the way, I am using uBlock Origin on my web browsers. So I am definetly not testing my filterlists on other adblocks.