Closed pallebone closed 3 years ago
New lists will be available in the next 24 hours, since they are so large git wasn't able to support them.
@pallebone
If you want to donate your system resources let me know.
What kind of resources?
Mostly CPU, some bandwidth (5GB), and storage (1 GB)
I can host but cpu is limited on the cloud instance I use. Probably better to just ftp up any files you want hosted and compute locally on your pc. It will be faster.
Hey, it's going to take a couple of hours for it to update but it's fine, Its not the hosting that's the issue the hosting is fine, it's just that generating the list takes too long and too much system resources,
This is one of the biggest list in GitHub that I know of, and I ran out of my system resources in a single day.
I have to use Git LFS just to host the files, the more scan there are the more ad servers I find.
Ok if you change your mind and need a place to put files in the cloud let me know and I can setup an ftp account for you.
It's okay, thanks tho.
I don't need storage I need CPU resources to find more lists and than validate those lists.
Ok. Sorry I dont have that. Hope you can find a solution.
for few days, you can use my vserver for this
for few days, you can use my vserver for this
Its okay thank you, i have tried cloud servers and bare metal servers, but the issue is that it wont be able to handle the load.
hmm okay, i still sent you the server data to protonmail.com just try it out
hmm okay, i still sent you the server data to protonmail.com just try it out
Its okay thank you, i don't need it.
The problem right now is that I have over 10 million domains on my lists, and it can take up to a week to validate them all one by one, or I can validate them super fast, but that uses a lot of system resources. All I need to do now is figure out a way to validate them super fast without using a lot of system resources, and we'll be fine.
They're a large list with over 20 million domains if I don't validate them, and a small embedded system cannot support lists that big.
why all at once in one big list? Do you just make several smaller ones and divide them up on several servers? Surely that only has to be done once and some are duplicated.
In the first round a file with 1,000,000 entries, all that remains, merge and divide again, check merge.
why all at once in one big list? Do you just make several smaller ones and divide them up on several servers? Surely that only has to be done once and some are duplicated.
In the first round a file with 1,000,000 entries, all that remains, merge and divide again, check merge.
This is one of the options i have thought of but i know if i work on it a bit more, i can make a single automated system that will do this every 24 hours.
Once i get this pr to fix the issue, it will be able to validate 20M domains easily under 2 hours.
https://github.com/complexorganizations/content-blocker/pull/26
Just fyi some malicious domains come online only when a scam is running eg: scammers turn on the webserver then disable them again to be used again in the future so validating they are online can be a negative consequence of removing domains not always online but should be blocked.
Just fyi some malicious domains come online only when a scam is running eg: scammers turn on the webserver then disable them again to be used again in the future so validating they are online can be a negative consequence of removing domains not always online but should be blocked.
Yeah, i check if the domain is registered or not, that's what i consider validation.
if there is even 0.01% proof that its registered than its a valid domain, Using about 10x different validation methods right now and will add more in the future.
There are 100000x of domains which are not even registered at the moment and still are on other people lists, it will take up too much system resources to block them, while they are not even registered.
Some guy wrote random word generator and than use that as a suffix list and than that is like 1GB of extra invalid domain names. shit like example.fdkljfhdlkjfd
, cant be a valid domain but its still in people lists like wtf.
Ok makes sense.
Stevenhost is one of the most popular lists I'm aware of, and they contain a handful of domains that don't make any sense.
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
0.0.0.0 castoola.tv.lan
Even if it isn't a genuine domain, it is still on the list.
r1.sn-o097znlr.a1.googlevideo
r2.sn-o097znlr.a1.googlevideo
r3.sn-o097znlr.a1.googlevideo
r4.sn-o097znlr.a1.googlevideo
r5.sn-o097znlr.a1.googlevideo
r6.sn-o097znlr.a1.googlevideo
r7.sn-o097znlr.a1.googlevideo
r8.sn-o097znlr.a1.googlevideo
r9.sn-o097znlr.a1.googlevideo
r10.sn-o097znlr.a1.googlevideo
r11.sn-o097znlr.a1.googlevideo
r12.sn-o097znlr.a1.googlevideo
r13.sn-o097znlr.a1.googlevideo
r14.sn-o097znlr.a1.googlevideo
r15.sn-o097znlr.a1.googlevideo
r16.sn-o097znlr.a1.googlevideo
r17.sn-o097znlr.a1.googlevideo
r18.sn-o097znlr.a1.googlevideo
r19.sn-o097znlr.a1.googlevideo
r20.sn-o097znlr.a1.googlevideo
https://raw.githubusercontent.com/kboghdady/youTube_ads_4_pi-hole/master/youtubelist.txt Even if they aren't genuine domains, they're nonetheless on the lists.
You may utilize your Dns server and than block these domains but what's the purpose on wasting your own system resources on banning them when they are not even a legitimate domain name.
Last example
0.0.0.0 7cloudtech-vps.info
from https://raw.githubusercontent.com/blocklistproject/Lists/master/fraud.txt It hasn't even been registered yet, yet it is still banned.
https://domains.google.com/registrar/search?searchTerm=7cloudtech-vps.info
What are individuals trying to accomplish when they try to generate random domain names and then forecast whether or not an attack would utilize them in the future?
As no further discussion warranted I am closing this issue to tidy up.
@pallebone I am going to push unvalidated data and than work on this, and than once its ready i will push the validated data.
@pallebone @Saugjunkie
The temp lists are ready, this is the biggest lists on github that i know of, the official lists contains over 5 mil valid domains, but these are temp update.
yeah verry nice and now we can testing this? with the coredns branch from wireguard didnt work
For now import it on unblock origin, and will push a fix for coredns
For now import it on unblock origin, and will push a fix for coredns
Unfortunately, I didn't understand that despite google translator
My lads, I finally got it working on a digitalocean vps with 64 GB memory and 32 cores. I'll send out the entire list tonight, and an automated tool by tomorrow.
good job. sounds expensive though :(
good job. sounds expensive though :(
It costs around $1 per hour, and updating the list takes about an hour, so over the course of a month, it costs about $30, which is good.
I'll gladly pay $30 to ensure that all of the domains are legitimate and operating.
Guys, it works now and pushes a update every 24 hours.
Note: Every single domain is active and validated, i am 99% sure this is one of the biggest lists on github, there is one bigger than this but thats not validated data.
Guys, it works now and pushes a update every 24 hours.
Good News
THX @Prajwal-Koirala
Hi,
I was testing these lists but today they are all 404.
EG: https://raw.githubusercontent.com/complexorganizations/content-blocker/main/configs/hosts
No longer exists?
Kind regards Pete