badmojr / 1Hosts

World's most advanced DNS filter-/blocklists!
https://o0.pages.dev
Mozilla Public License 2.0
1.43k stars 82 forks source link

false positives not showing up #437

Closed p1r473 closed 2 years ago

p1r473 commented 2 years ago

Hello, I ran a wget on all files in https://raw.githubusercontent.com/badmojr/1Hosts/master/-data/lists/assets.txt I then did a search in these files for some of my latest false positives: fbpurity. inktuitive, jmp.sh, trace3, boxknight, expel.io I could not find these texts in any of these files

where are all of these false positives coming from because the text isnt found in assets.txt

I am using 500 lists- this list is the only one getting all of these false positives.

badmojr commented 2 years ago

Hi! Try unzipping some of the compressed files before doing so.

p1r473 commented 2 years ago

I tried that still no luck

unzip "*.zip"

Archive:  BadHosts.unx.zip
   creating: BadHosts.unx/
  inflating: BadHosts.unx/add.2o7Net
  inflating: BadHosts.unx/add.Spam
  inflating: BadHosts.unx/ChildSafe.txt
  inflating: BadHosts.unx/Checksums
  inflating: BadHosts.unx/add.Dead
  inflating: BadHosts.unx/add.Header
 extracting: BadHosts.unx/hosts.lnx.sig
  inflating: BadHosts.unx/add.Casino
  inflating: BadHosts.unx/sums.md5
  inflating: BadHosts.unx/newhosts.sh
  inflating: BadHosts.unx/warning.txt
  inflating: BadHosts.unx/sums.sha256
  inflating: BadHosts.unx/hosts.lnx
  inflating: BadHosts.unx/copying.txt
  inflating: BadHosts.unx/add.Porn
  inflating: BadHosts.unx/add.Risk
 extracting: BadHosts.unx/add.Risk.sig
  inflating: BadHosts.unx/sums.sha1
  inflating: BadHosts.unx/SecureMeccaUpdated.sh
  inflating: BadHosts.unx/AutoHosts.sh
  inflating: BadHosts.unx/main
  inflating: BadHosts.unx/ReadUnix
  inflating: BadHosts.unx/PolicyForBlock.txt

Archive:  master.zip
f7da95f094bd3ca0200aba2c417cbebee9d89304
   creating: domain-list-master/
  inflating: domain-list-master/README.md
  inflating: domain-list-master/ads.txt
  inflating: domain-list-master/affiliate.txt
  inflating: domain-list-master/analytics.txt
  inflating: domain-list-master/enrichments.txt
  inflating: domain-list-master/fake.txt
  inflating: domain-list-master/widgets.txt

2 archives were successfully processed.

Try and find all the FPs. Cant find them. Where are they coming from? They arent in the source files

root@Harbormaster:/home/pi/test# grep -Ril "boxknight" .
root@Harbormaster:/home/pi/test# grep -Ril "expel.io" .
root@Harbormaster:/home/pi/test# grep -Ril "trace3" .
root@Harbormaster:/home/pi/test# grep -Ril "fbpurity" .
root@Harbormaster:/home/pi/test# grep -Ril "inktuitive" .
root@Harbormaster:/home/pi/test# grep -Ril "jmp.sh" .

Testing my methodology

grep -Ril "ads.google.com" .

./3.txt
./abpvn.txt
./jp-filters.txt
./block.list
./Ads
./adservers.txt
./hosts.6
./hosts.9
./2.txt
./sr_proxy_banad.conf
./index.html.2
./easylist.txt
./fanboy-ultimate.txt
./user.txt
./hosts.3
./Ads-Blocklist
./ads-tracking
./adplus.txt
./blocked.txt
./dnscrypt-proxy.blacklist.txt
./pi_indo_ads.txt
./hosts.17
./hosts.txt.1
./domain-list-master/ads.txt
./adservers-and-trackers.txt
./serverlist.php?showintro=0
./smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
./reject.list
./ads-nl.txt
./ads-and-tracking-extended.txt
./domains.txt.2
./hosts.13
./hosts.8
./hosts.16
./Regular Hosts.txt
./hosts.1
p1r473 commented 2 years ago

Perhaps you fixed something in the backend? Not able to see these domains blocked by you anymore - though the issues are still open image

badmojr commented 2 years ago

Perhaps you fixed something in the backend? Not able to see these domains blocked by you anymore - though the issues are still open image

You are absolutely on point. The backend makes use of some 'trusted' external whitelists. In this case, the domains were delisted because they were listed for removal from https://github.com/p1r473/hosts/raw/master/whitelist.txt .

badmojr commented 2 years ago

I tried that still no luck

Try this: https://github.com/badmojr/1Hosts/blob/master/-data/lists/assets.txt#L189

p1r473 commented 2 years ago

@badmojr good find. That one was missed from my wget because a blocklist was blocking AWS. Ive opened a ticket to rectify of course. image

Youre right- this is where they are coming from. Can we remove this from the upstream as this is causing a massive amount of false positives? image

Based on my research, we are blocking the most popular websites with umbrella-static/top-1m.csv.zip https://s3-us-west-1.amazonaws.com/umbrella-static/index.html "The popularity list contains our most queried domains based on passive DNS usage across our Umbrella global network of more than 100 Billion requests per day with 65 million unique active users, in more than 165 countries. [..] the metric is not based on only browser based 'http' requests from users but rather takes in to account the number of unique client IPs invoking this domain relative to the sum of all requests to all domains."

p1r473 commented 2 years ago

Perhaps you fixed something in the backend? Not able to see these domains blocked by you anymore - though the issues are still open image

You are absolutely on point. The backend makes use of some 'trusted' external whitelists. In this case, the domains were delisted because they were listed for removal from https://github.com/p1r473/hosts/raw/master/whitelist.txt .

I dont recommend relying on my whitelist- my scripts automatically delete items from my whitelist once they are no longer on any of the block lists/ad lists! I didn't realize you were using my list in the back end :) This is circular logic

  1. I added it to my whitelist
  2. Your list sees it on my whitelist and removes it
  3. My script removes it from the whitelist because its no longer blocked anywhere
  4. It reappears on your list because I removed it from my whitelist
  5. Return to step 1

If you want to keep using my whitelist in the backend, I can try and make an exception to my whitelist removal to not remove anything found on your blocklists. Alternatively, you can host your own whitelist and not rely on mine Let me know how we want to handle this circular logic

badmojr commented 2 years ago

If you want to keep using my whitelist in the backend, I can try and make an exception to my whitelist removal to not remove anything found on your blocklists. Alternatively, you can host your own whitelist and not rely on mine Let me know how we want to handle this circular logic

I am aware that u remove entries from your whitelist once they are no longer blocked. It why I have set up the parsing script to create a new combined whitelist made up of delisted/whitelisted entries from external 'trusted' sources. So worry not! Once a domain is removed, it won't end up on the lists.

crssi commented 2 years ago

@p1r473 do you mind to share URL for "Find Blocked Domain In Lists"?

p1r473 commented 2 years ago

@p1r473 do you mind to share URL for "Find Blocked Domain In Lists"?

Its a part of PiHole http://pi.hole/admin/queryads.php

crssi commented 2 years ago

Thank you