Closed p1r473 closed 2 years ago
Hi! Try unzipping some of the compressed files before doing so.
I tried that still no luck
unzip "*.zip"
Archive: BadHosts.unx.zip
creating: BadHosts.unx/
inflating: BadHosts.unx/add.2o7Net
inflating: BadHosts.unx/add.Spam
inflating: BadHosts.unx/ChildSafe.txt
inflating: BadHosts.unx/Checksums
inflating: BadHosts.unx/add.Dead
inflating: BadHosts.unx/add.Header
extracting: BadHosts.unx/hosts.lnx.sig
inflating: BadHosts.unx/add.Casino
inflating: BadHosts.unx/sums.md5
inflating: BadHosts.unx/newhosts.sh
inflating: BadHosts.unx/warning.txt
inflating: BadHosts.unx/sums.sha256
inflating: BadHosts.unx/hosts.lnx
inflating: BadHosts.unx/copying.txt
inflating: BadHosts.unx/add.Porn
inflating: BadHosts.unx/add.Risk
extracting: BadHosts.unx/add.Risk.sig
inflating: BadHosts.unx/sums.sha1
inflating: BadHosts.unx/SecureMeccaUpdated.sh
inflating: BadHosts.unx/AutoHosts.sh
inflating: BadHosts.unx/main
inflating: BadHosts.unx/ReadUnix
inflating: BadHosts.unx/PolicyForBlock.txt
Archive: master.zip
f7da95f094bd3ca0200aba2c417cbebee9d89304
creating: domain-list-master/
inflating: domain-list-master/README.md
inflating: domain-list-master/ads.txt
inflating: domain-list-master/affiliate.txt
inflating: domain-list-master/analytics.txt
inflating: domain-list-master/enrichments.txt
inflating: domain-list-master/fake.txt
inflating: domain-list-master/widgets.txt
2 archives were successfully processed.
Try and find all the FPs. Cant find them. Where are they coming from? They arent in the source files
root@Harbormaster:/home/pi/test# grep -Ril "boxknight" .
root@Harbormaster:/home/pi/test# grep -Ril "expel.io" .
root@Harbormaster:/home/pi/test# grep -Ril "trace3" .
root@Harbormaster:/home/pi/test# grep -Ril "fbpurity" .
root@Harbormaster:/home/pi/test# grep -Ril "inktuitive" .
root@Harbormaster:/home/pi/test# grep -Ril "jmp.sh" .
Testing my methodology
grep -Ril "ads.google.com" .
./3.txt
./abpvn.txt
./jp-filters.txt
./block.list
./Ads
./adservers.txt
./hosts.6
./hosts.9
./2.txt
./sr_proxy_banad.conf
./index.html.2
./easylist.txt
./fanboy-ultimate.txt
./user.txt
./hosts.3
./Ads-Blocklist
./ads-tracking
./adplus.txt
./blocked.txt
./dnscrypt-proxy.blacklist.txt
./pi_indo_ads.txt
./hosts.17
./hosts.txt.1
./domain-list-master/ads.txt
./adservers-and-trackers.txt
./serverlist.php?showintro=0
./smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
./reject.list
./ads-nl.txt
./ads-and-tracking-extended.txt
./domains.txt.2
./hosts.13
./hosts.8
./hosts.16
./Regular Hosts.txt
./hosts.1
Perhaps you fixed something in the backend?
Not able to see these domains blocked by you anymore - though the issues are still open
Perhaps you fixed something in the backend? Not able to see these domains blocked by you anymore - though the issues are still open
You are absolutely on point. The backend makes use of some 'trusted' external whitelists. In this case, the domains were delisted because they were listed for removal from https://github.com/p1r473/hosts/raw/master/whitelist.txt .
I tried that still no luck
Try this: https://github.com/badmojr/1Hosts/blob/master/-data/lists/assets.txt#L189
@badmojr good find. That one was missed from my wget because a blocklist was blocking AWS. Ive opened a ticket to rectify of course.
Youre right- this is where they are coming from. Can we remove this from the upstream as this is causing a massive amount of false positives?
Based on my research, we are blocking the most popular websites with umbrella-static/top-1m.csv.zip https://s3-us-west-1.amazonaws.com/umbrella-static/index.html "The popularity list contains our most queried domains based on passive DNS usage across our Umbrella global network of more than 100 Billion requests per day with 65 million unique active users, in more than 165 countries. [..] the metric is not based on only browser based 'http' requests from users but rather takes in to account the number of unique client IPs invoking this domain relative to the sum of all requests to all domains."
Perhaps you fixed something in the backend? Not able to see these domains blocked by you anymore - though the issues are still open
You are absolutely on point. The backend makes use of some 'trusted' external whitelists. In this case, the domains were delisted because they were listed for removal from https://github.com/p1r473/hosts/raw/master/whitelist.txt .
I dont recommend relying on my whitelist- my scripts automatically delete items from my whitelist once they are no longer on any of the block lists/ad lists! I didn't realize you were using my list in the back end :) This is circular logic
If you want to keep using my whitelist in the backend, I can try and make an exception to my whitelist removal to not remove anything found on your blocklists. Alternatively, you can host your own whitelist and not rely on mine Let me know how we want to handle this circular logic
If you want to keep using my whitelist in the backend, I can try and make an exception to my whitelist removal to not remove anything found on your blocklists. Alternatively, you can host your own whitelist and not rely on mine Let me know how we want to handle this circular logic
I am aware that u remove entries from your whitelist once they are no longer blocked. It why I have set up the parsing script to create a new combined whitelist made up of delisted/whitelisted entries from external 'trusted' sources. So worry not! Once a domain is removed, it won't end up on the lists.
@p1r473 do you mind to share URL for "Find Blocked Domain In Lists"?
@p1r473 do you mind to share URL for "Find Blocked Domain In Lists"?
Its a part of PiHole http://pi.hole/admin/queryads.php
Thank you
Hello, I ran a wget on all files in https://raw.githubusercontent.com/badmojr/1Hosts/master/-data/lists/assets.txt I then did a search in these files for some of my latest false positives: fbpurity. inktuitive, jmp.sh, trace3, boxknight, expel.io I could not find these texts in any of these files
where are all of these false positives coming from because the text isnt found in assets.txt
I am using 500 lists- this list is the only one getting all of these false positives.