StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.87k stars 2.23k forks source link

Add support for Little Snitch Rule Group Subscriptions format #739

Open StevenBlack opened 6 years ago

StevenBlack commented 6 years ago

See Little Snitch Rule Group Subscriptions format.

See also Little Snitch at obdev.at (MacOS only)

StevenBlack commented 6 years ago

Hey this is cool. It's been done! https://github.com/naveednajam/Little-Snitch---Rule-Groups

funilrys commented 6 years ago

May I ask if 10 000 entries is the limit ?

CF:

StevenBlack commented 6 years ago

Ya Nissar @funilrys under current circumstances, the hosts file must be broken into chunks no larger than 10,000 entries.

Seems like a bogus limitation to me; I'm puzzled by this limitation in Little Snitch.

funilrys commented 6 years ago

Ouch ... That's not funny ...

StevenBlack commented 6 years ago

@funilrys it just makes integrating into Little Snitch probably not worth the hassle.

funilrys commented 6 years ago

Having 7 files just for the unified hosts file may be too much ... And if we grow up it may become unmanageable ...

But yeah we can think about integrating that if it is used :smile_cat:

claysimps commented 6 years ago

Hi guys, I'm totally new to this stuff, how do I use the python script and where should I save it? I ran the script and it downloaded the lsrules but I can't seem to import them.

Thanks in advance!

mickaphd commented 6 years ago

I think 10K is not the limit anymore no? Could be awesome to be able to subscribe to a HOSTS group! The Peter Lowe's Block List works pretty well, but I imagine the HOSTS one would block much more

funilrys commented 5 years ago

@spirillen are you out of scope? The script is written in python and I don't see any sed nor awk ...

notDavid commented 5 years ago

"As of Little Snitch 4.3 (5264) max domains per rule increased to 200,000"

https://github.com/naveednajam/Little-Snitch---Rule-Groups/commit/6592507dbbfa0fd0e6b88295370938f8cde9e56c

funilrys commented 5 years ago

Now it's interesting to code that feature Steve @StevenBlack 😁

Can you tell me how you imagine it Steve?

StevenBlack commented 5 years ago

That's a good question, Nissar @funilrys.

In my implementations, Little Snitch is completely out of control. Its rules management system isn't up to the task.

For example, many times, I want to review rules added in the past few minutes. Can I sort rules by date and time added? No, no I can't. I can only see rules added in the past 24 hours, within controlling applications, in a tree. I hate this missing sorting functionality.

So in my mind, I would like to delete ALL my Little Snitch rules, then apply the hosts file for "All Applications".

What's your thought?

scrossan commented 5 years ago

@StevenBlack I agree with your criticisms of Little Snitch's UI for viewing rules, there could certainly be some improvements there, but the way that LS rule groups are managed now has been greatly improved - especially now that the limit is 200,000 rules. In addition to that there is now support for rule groups that simply block a group of host names as any of your hosts files would be used for via the denied-remote-domains key (1). Using rules groups formatted like this leads to much reduced UI lag, in my experience, when compared to using 5-6 files each containing 10,000 rules. Until better support for blocklists was implemented in LS 4.2 I had actually stopped using the rule groups generated by https://github.com/naveednajam/Little-Snitch---Rule-Groups because the UI was basically unusable.

So in my mind, I would like to delete ALL my Little Snitch rules, then apply the hosts file for "All Applications".

To this point, rules that are contained within imported rule groups are automatically given a higher priority than rules a user has added, apply to all users and also all processes if the rule group uses the denied-remote-domains format. If a user already has rules blocking some of the hostnames contained in an imported rule group then LS will flag those as duplicates and allow a user to delete them specifically under Suggestions > Redundant Rules.

Also, if the user has the "Mark new rules as unapproved" preference enabled under Preferences > Advanced, whenever there's an update to an imported rule group LS displays the changes under Unapproved Rules.

I forked @naveednajam's repo and updated the code to use the blocklist format since it's much simpler: https://github.com/scrossan/Little-Snitch---Rule-Groups

(1): https://help.obdev.at/littlesnitch/ref-lsrules-file-format

StevenBlack commented 5 years ago

Thank you for this insight, Steven @scrossan. I think we'll move in this direction very soon.

I have another thought: domains are, or course, only useful for HTTP requests that route via domains.

I wonder if we should find the ip addresses behind all our listed domains, and add those to the Little Snitch rules, so actors that use direct IP addresses are prevented.

notDavid commented 5 years ago

@StevenBlack Fyi, coincidentally i was looking into this as well. Perhaps this is useful, it should return all Facebook IPs, by ASN:

whois -h whois.radb.net '!gAS32934'  #  | tr ' ' '\n' | awk '!/[[:alpha:]]/'

I was trying to figure out a way to optimise this list, like for example "ipset" optimises it to:

Addresses before CIDR optimization: 106
Addresses after CIDR optimization:  15
...
74.119.76.0/22
31.13.64.0/18
185.60.216.0/22
173.252.64.0/18
66.220.144.0/20
157.240.0.0/16
69.63.176.0/20
204.15.20.0/22
69.171.224.0/19
179.60.192.0/22
129.134.0.0/16
45.64.40.0/22
103.4.96.0/22
31.13.24.0/21
199.201.64.0/22

which would then have to be converted into Little Snitch rules format.

StevenBlack commented 5 years ago

@notDavid you know, I hadn't even thought about IP ranges.

Because, let's be honest, it's the IP address we ultimately care about.

So here's another angle: sites that share IP addresses. A server can serve many, maybe very many, domains. Some will be unrelated to the malware, and be hurt by IP address blocking.

EDIT: What I mean is, in the Little Snitch context, IP address blocking is very powerful and ultimately, most hermetic.

notDavid commented 5 years ago

Some will be unrelated to the malware, and be hurt by IP address blocking

Not exactly sure what you mean, but ultimately a firewall always blocks ip-addresses. Even if you specify a hostname.

So if something.facebook.com resolves to 1.2.3.4 and someotherwebsite.com also resolves to 1.2.3.4, blocking something.facebook.com will also block someotherwebsite.com. It makes no difference whether you specify the hostname or the IP in your blocklist, the result is the same.

StevenBlack commented 5 years ago

...ultimately a firewall always blocks ip-addresses. Even if you specify a hostname.

@notDavid how would we know this is true for Little Snitch?

For example, in Little Snitch, if for app Spotify we Deny outgoing connections to domain googletagmanager.com, I'm not seeing 172.217.11.8 anywhere in the data tied to app Spotify.

How would we know if application Spotify calls 172.217.11.8 directly, that this will be blocked by Little Snitch?

It seems wildly implausible that Little Snitch would resolve IP addresses based on domains, in real time, for everything.

It's more plausible that, given just a domain, Little Snitch intercepts calls on Port 53 (DNS) and mangles either the request or response for whatever configured apps requesting to, say, googletagmanager.com, leaving googletagmanager.com open for apps that don't have this constraint.

I'm just guessing here...

notDavid commented 5 years ago

It seems wildly implausible that Little Snitch would resolve IP addresses based on domains, in real time, for everything.

Good question, I only know that i read this is how the OSX packet filter (pfctl) works: all fully qualified domain names will be resolved via DNS when the ruleset is loaded. All resulting IP addresses will be substituted into the rule.

I will have a look later if i can test in Little Snitch / check their docs.

notDavid commented 5 years ago

So a quick test confirms you are right; I simply tested:

New Rule -> Any Process -> Deny -> To ->

then i configured one of these and tested them 1 by 1:

  1. To -> Domain = httpbin.org
  2. To -> Hostname = httpbin.org
  3. To -> Ip Addresses = httpbin.org (this will resolve to the ips in the gui)
  4. To -> Ip Addresses = 3.223.234.9, 52.22.188.80

So for each of these rules i tested in Terminal:

Indeed only 3 and 4 will block everything. 1 and 2 will block the domain but not the ip (curl http://52.22.188.80/).

Weird! I didn't expect that...

StevenBlack commented 5 years ago

@notDavid, Dan @dnmTX, Nissar @funilrys, @ScriptTiger, @anudeepND, Tomasz @FadeMind, and all others...

Here's an interesting engineering question.

How would you suggest doing this reliably and respectfully, and in an automated way?

I presume we'd want to do this in a rate-limited way, so this isn't a blind and mindless process.

Would you run the IP matching on our published hosts files, or would you pre-load the IP address matches by iterating the sources in our ./data and ./extensions folders, so amalgamated hosts files and associated IP addresses can be released simultaneously? I'd love to release all products together.

Would you add this functionality to this repo, or create a separate project that would coordinate and execute IP address verification and make automated PRs to this repo? In other words, do we bolt-on all that's required for new products, or do we make creating non-hosts products a separate and independent responsibility?

It's a lot to think about.

naveednajam commented 5 years ago

I think the most effective way to get associated ip address in a continuous and efficient manner is to use the ttl value of the domain. This will require to set up a database which allow to query domain /subdomain based on ttl value and then refresh ip address associated with those domain and move the domain to appropriate list if the new ttl value is different then existing value. If database approach is used then, adding record into the database can be implemented in such way that it resolve the ip address and ttl at the time of inserting new entry.

dig can be used to get the ttl value and associated ip address. to avoid abusing any public dns it will be more efficient to setup a local dns which will also help to cache the result and provide faster response during each update cycle.

dig -q google.com -t A +noall +answer +ttl ; <<>> DiG 9.10.6 <<>> -q google.com -t A +noall +answer +ttl' ;; global options: +cmd google.com. 300 IN A 172.217.166.142

dig -q microsoft.com -t A +noall +answer +ttl ; <<>> DiG 9.10.6 <<>> -q microsoft.com -t A +noall +answer +ttl ;; global options: +cmd microsoft.com. 3600 IN A 40.113.200.201

StevenBlack commented 5 years ago

Excellent point about ttl, @naveednajam.

And yes, a database, definitely. Maybe a distributed one, maybe publicly accessible for read ops, so forks could share the data. Just thinking out loud, here.

naveednajam commented 5 years ago

maybe publicly accessible for read ops, so forks could share the data. Just thinking out loud, here.

yes, more precisely via RESTful API.

naveednajam commented 5 years ago

@scrossan

I forked @naveednajam's repo and updated the code to use the blocklist format since it's much simpler: https://github.com/scrossan/Little-Snitch---Rule-Groups

would not it be better to usedenied-remote-hosts instead of domain. most of the time ads content is served by a specific host within a domain. Unless we wanted to block the whole domain. it would be nicer to have three entries one with domain where top level domain is blocked and 2nd for hosts and 3rd where we have ip addresses in the list we need to block it using denied-remote-addresses

rjhancock commented 5 years ago

As far as creating the LS Rule groups, I wrote up a service that compiles all of @StevenBlack sources and converts them into a set of LS Rules files, updated weekly, that can be used inside of LittleSnitch along with auto updating. Site was slapped together to get the rules out (and is in desperate need of a proper design) as I plan on using them in an iOS app for Content Blocking (given away for free).

If interested, I can share the hostname to let people subscribe directly.

StevenBlack commented 5 years ago

I would love that, Richard @rjhancock.

Presently I have Little Snitch disabled because managing its rules is such a mess.

If this were data-driven and automated, wow, that's a game changer.

rjhancock commented 5 years ago

https://hostblocker.app is where the list is shown. Below is what it looks like in my LittleSnitch.

Every week the backend API downloads the hosts files, parses them, updates, and pushes out new static files to reduce load.

The links are also setup to be auto-added into LittleSnitch.

Screen Shot 2019-11-06 at 11 35 15 AM
StevenBlack commented 5 years ago

@rjhancock man, that's beauty, right there. This is how Little Snitch should have always been.

I'm going to dive into this as soon as I can come up for air.

rjhancock commented 5 years ago

Best part, they survive macOS Catalina updates (which replaces the Hosts files for even patch releases).

mickaphd commented 5 years ago

Sounds awsome man, gonna try that very soon. What's your app on iOS? I'm using right now Lockdown which is great too but do not block (yet!) all of these

rjhancock commented 5 years ago

It's called HostBlocker. Uses these same lists to block them in Safari and the corresponding web component. Not released yet (lack of time among other things). Should have a beta out by year end.

The issue with Content Blocking on iOS is you either generate a list with a cap of 50k entries or you create a local DNS server to handle all the resolution and potentially expose it to the company.

There are APIs to do it at a system level but they are limited to managed devices only (which may be an extra feature for that that wish to "manage" their own devices.

rjhancock commented 4 years ago

An update on LittleSnitch, it's updated every week like clockwork adding/removing domains as the lists change.

And thanks for pointing me towards Lockdown @mickaphd. I copied the repo onto my GitLab instance and started going through their code and trimming the fat (removing the iAP, Excessive logging, VPN, etc). Thus far it's looking like I'll have a working version of mine by end of week using the lists here and opening it up to also have a better ad blocker using CSS rules for Safari when the hosts fail.

Thanks for the great work @StevenBlack as none of what I'm doing would be feasibly possible without it.

cLupus commented 4 years ago

I've made a similar solution (unaware of @rjhancock's solution), which is a more direct translation of @StevenBlack host files to Little Snitch. It may be of interest.

naveednajam commented 4 years ago

It's great to see and meet like minded people. I moved forward with original idea of translating StevenBlack host files to little snitch rule Little-Snitch--RuleGroups to next level RuleGroupsV2 when I feel Little Snitch has a lot more potential than simple host files. No doubt @StevenBlack was main source of motivation.

StevenBlack commented 4 years ago

@cLupus and Naveed @naveednajam this is awesome.

Reliable and dynamic Little Snitch rule groups by subscription is the ultimate solution on MacOS, I feel. This may be the direction I take, next. I'm going to try this. Little Snitch is too bothersome and intrusive, otherwise. But it's SO good at what it does....

Everyone: if you use MacOS and haven't investigated Little snitch, you should.

ghost commented 2 years ago

Hi guys, Just seeing this a little late; any updates? I went ahead and subscribed to several lists by HostBlocker (thanks @rjhancock), although I could not find the unified hosts list by @StevenBlack, just several extensions. Am I missing something? And are these lists updated regularly? Thanks! Tommie

rjhancock commented 2 years ago

It's the individual lists broken up. I'm going to need to update the site a bit more and probably update the file formats as I couldn't get them to work with the latest Little Snitch format.

ghost commented 2 years ago

Thanks for the quick reply, @rjhancock. So I take it the lists have not been updated in a while? Therefore, does it make more sense to use @naveednajam's script to convert @StevenBlack's lists manually? Not as handy as a subscription, however.

rjhancock commented 2 years ago

Give me a few days as I'll work on it today and give it a lot of under the hood updates (the backend is a few versions out of date). I'll post back when it's done.

StevenBlack commented 2 years ago

@TommieWong I would love, love, LOVE to have a reliable connection between our hosts and Little Snitch.

This would be a perfect security scenario for MacOS. I would be the most avid user, and proponent, of this.

It's precisely because Little Snitch lacks (or lacked) a strong third-party support that I don't use it anymore. Without a solid add-on story, Little Snitch becomes a constant source of interruption as it asks the user if connecting to domain xyz.abc is OK, and even I don't know the answer, in the moment.

At least, that's my recollection of the state of Little Snitch about three years ago, when I decided to uninstall it from all my Macs.

Richard @rjhancock what's the latest on Little Snitch? I'd be grateful for your insight.

rjhancock commented 2 years ago

I use it on my M1 as I have a PiHole at home now using these lists. It still runs fine and with these subscriptions in place, rarely get a notification on anything.

I get my Hostblocker site updated and I think the pairing is perfect.

StevenBlack commented 2 years ago

@rjhancock same setup here Richard. It's a near-perfect computing environment, the best I've ever had (except for Windows VM support on M1 — that part sucks.)

If I had Little Snitch protecting all the non-hosts areas without too many interruptions, man, I'd be over-the-moon!

ghost commented 2 years ago

Thanks a lot for the answers, @StevenBlack and @rjhancock. Indeed, a reliable link between the hosts blocklist and LS would be awesome. I currently just have Peter Lowe's and it's already helpful. As for the constant interruptions, I gave up on browser allow/deny rules, so it's allow all + uBlock origin. So LS is mostly to completely cut off some apps from the internet.

rjhancock commented 2 years ago

I'd like to report that the service has been updated, lists updated, and confirmed working in latest version of Little Snitch on my M1. Feel free to pull updates weekly.

I suggest setting to auto approve (if you trust me that is) and to auto update weekly.

It's not the unified list but each source individually so you can customize as you see fit. I'm showing just under 91k total rules.

rjhancock commented 2 years ago

I see I'm missing a few and have a few outdated links. Updating now and new lists should be generated within the hour.

rjhancock commented 2 years ago

Showing roughly 110k now.

I use this setup when I'm not at home and I get near 0 interruptions for it (get one about every month or so).

If @StevenBlack can keep me updated on changes to what lists they pull from, I can keep this site up and running with those updates.

ghost commented 2 years ago

Thanks a lot for the quick work, @rjhancock! However, I can no longer see the links on hostblocker.app, so I can only update the lists that I had already subscribed to. Tried from a different browser and same result. Also, out of curiosity, what are the interruptions you get (or that @StevenBlack was talking about)? On my end, LS mostly completely blocks application from the internet, or is set to allow just connections to update servers. Same for system services. Then, there remains the case of the browser which is set to allow all (and then the filter takes place in the browser itself); is this where our setups differ? Can I ask what your LS settings are?

rjhancock commented 2 years ago

The interruptions are when LS asks permission for a location for an app. On a clean slate it gets quite annoying all the different sites various pieces of software tries to connect to that it becomes a state of just allow all and go back later to block.

The issue on the site is a CORS issue that isn't appearing in Safari (errors show but the site still loads). Will adjust accordingly.

rjhancock commented 2 years ago

Fixed. I just have the default settings setup for mine.