OrchidTechnologies / orchid

Orchid: VPN, Personal Firewall
https://www.orchid.com/
GNU Affero General Public License v3.0
649 stars 103 forks source link

Add Node Location Selection #86

Closed cl0vrfi3ld closed 3 years ago

cl0vrfi3ld commented 3 years ago

Is your feature request related to a problem? Please describe. Nope

Describe the solution you'd like Pretty much all (d)VPN's have an option to select connection locations. I was poking through the GitHub/blog and found out that Orchid already stores metadata about nodes, which can/does contain location information. I have a very basic understanding of the marketplace contracts but based on said metadata, I'd assume it'd be easy to allow the client to filter through nodes and only post tickets to the nodes that are in the selected region.

Describe alternatives you've considered Not really sure what to put here. I would've tried implementing this myself but most of the tech here is way out of my ballpark at the moment. There's also other dVPNs but so far I've enjoyed my Orchid experience and I'd like to stick with them for now.

Additional context Nah

saurik commented 3 years ago

So, there are a number of core issues here to consider.

One key issue is that, in a decentralized system, it is hard to rely on reported metrics. The "metadata about nodes" I'm assuming you are referring to (I do not read Orchid's marketing blog) is our decentralized service directory, which contains only the minimal amount of information required to contact a provider: in particular, it does not contain the "location" of the provider (I use the term "location" in a few places, but it is a virtual address, not a physical place). If the provider were to self-report a location, it would be expected--as this is all a game of incentives--that they will report something that improves the likelihood that people will connect through their node... and, in fact, in "other dVPN projects" (which, FWIW, I do not consider Orchid to be one of: it is a marketplace for bandwidth and other Internet services, and it is possible to build a VPN on top of it, and we have been building out what I feel is an interesting privacy stack on top of this... but as merely a VPN I honestly think it would be difficult to compete with a centralized provider) have this issue: I've sat around on the Sentinel Telegram channel and often noted people commenting that large blocks of their nodes claim to be in one place and are actually in another.

Additionally, the concept of "location" is frankly fuzzy: what are you really optimizing for? Even among the small number of nodes Orchid currently needs to operate, we've seen situations where a node is definitely located in one country, but has been marked in various databases as being in another country. This issue is actually endemic to VPN-like systems, as a lot of services attempt to slowly learn the physical location of IP addresses by slowly inferring them from user data (which includes GPS and locale settings and the such) over time: a number of Tor exit nodes, for example, have been marked by various services as being located in the Middle East, despite being firmly in the United States or Europe. You thereby really end up dealing not in an underlying ground truth--even if we could trust what was being reported--but a situation where what you really care about is where Google, or Netflix, or MaxMind consider you to be located... and these are all really separate answers! This issue also means that there isn't really any kind of objective "test" for "where is a node location": it is entirely use case specific; what you really end up doing is kind of getting some vague hint from the metadata, and then hopping around between servers until you find one that works... and then awkwardly "sticking to it" (which undermines the decentralized system).

It also must be understood that nothing is really ever in a single location... and this is endemic to the entire premise of how the Internet operates. We list in our system the address of a provider... but that provider is trying to compete with other providers on metrics such as latency, availability, bandwidth, and price. This means that they are incentivized to provide a multi-region fail-over strategy for their provisioned service, with numerous machines all being backed by their provider record. This improves the reliability of any third-party information you might obtain on the system, as well as helps prevent cases where people think they are selecting two providers but accidentally select a single provider that happens to have multiple entries. The result is that a provider doesn't really ever have a location: the best providers probably kind of exist everywhere? If it were even possible for there to be an objective test then of "what is the location of my IP address?", this would be not only in the eye of the beholder, but would change over time... enforcing that providers exist "at a location" kind of breaks some of our model assumptions for how to build more powerful primitives on top of this platform :(.

It also makes it really difficult for providers to get into "advanced routing", which is something that I want to really feel is "the norm": the mental model of your connection to an Orchid node is that you are talking to a small datacenter provider, on which you are able to run a tiny execution context... and one of the really awesome features of competing services such as GCP is the idea that, once you are inside of a provider's network, your egress from that network should be able to take the fastest path to your destination. You thereby can't really assume that you have any specific IP address at all while using Orchid: an individual socket would (usually) be bound to some external location, but in a more general sense it isn't clear to me that connections would be? Cloudflare actually does this for WARP+, which is their "not a VPN" VPN-like service (that also doesn't support the concept of "choose a location"): when you connect to a website through WARP+, it routes you via their CDN to the closest Cloudflare server to the destination via their backbone... this kind of routing would have to be disallowed at that level if we enforced that each provider has a specific location. To the extent to which I'd look at centralized companies to compare Orchid with (our "competitors"), it wouldn't be something like NordVPN or PIA: it would be AWS/GCP or Cloudflare WARP+ / Google One. I really liked this tweet from Cloudflare's CEO:

https://twitter.com/eastdakota/status/1176991175364030473

We don’t really intend to compare with other VPNs. If you have a VPN you’re happy with, stick with it. It undoubtedly does things WARP never will. If you never installed a VPN because they seem like more of a pain than the[y're] worth, WARP is for you.

And then, at the end of the day, it feels important to look at "why are you trying to obtain this feature?" and almost always it is "to be able to trick a remote service--often a streaming video company such as Netflix or Disney+--into giving me access to a different catalog of results"... and I just want to make sure you understand that Orchid is never going to be good at this, and I thereby do not have it as part of my model to figure out how to support this use case: a centralized VPN is always going to be able to do this better than Orchid, so it would be foolish to waste too much time attacking it. The core problem is that these content providers can and do coordinate on VPN block lists, and Orchid--as well as anyone trying to build a "dVPN"--is at a fundamental disadvantage with respect to bypassing these kinds of blocks: as a decentralized system, by and large our node pool is public... and to the extent to which any of it is hidden, it is specifically because of how I've abstracted out location (like, hiding the nodes a bit also hides their location a bit). This is why Tor nodes are blocked all the time: because there is a giant list of them that anyone can query... including the people who provide real-time blacklists (which the people who do location blocking tend to subscribe to).

It is actually even worse than that, though: content providers attempt to watch the usage of various IP addresses and automatically flag systems that "feel strange": if they see multiple accounts being used all at the same time from an address--and that address isn't also well-understood to be carrier-grade NAT (which tends to be very expensive cellular bandwidth)--that address is going to get burned. This creates a tragedy of the commons scenario, where any system that allows people to "willy-nilly" self-select nodes ends up with all their addresses burned... you even see this phenomenon happen with a lot of the centralized VPNs; however, centralized VPNs have a super-power here, which is that they don't have to have centralized node lists (though many are built in a way where they stupidly do) and they can have the equivalent of locks on their addresses to prevent them from being burned. They can even do this on a per-content company basis. A great example of this is NordVPN: they take connections to Disney+ and route them through what--as far as people have been able to tell--is a network of grey market IP addresses. To quote myself from another issue #68 where this same comment came up (and which I recommend you also read):

One VPN company that actually seems to do "well" at this is NordVPN: they've even managed to provide access to Disney+! Someone did a deep analysis of how this worked a while back (an article which has since been deleted, weirdly, but a copy can be found on the Internet Archive). They are "linked closely with a Lithuanian data mining company called Tesonet" which also runs Oxynet, which in turn advertises itself to have "32M+ residential proxies…100% anonymous proxies from all over the globe with zero IP blocking", which the author of that analysis believes is how NordVPN is originating their traffic... and how did they get all of those IP addresses? The contention was that they seem to be stealing them, convincing random products to embed malware that attaches them to the Oxynet essentially-a-botnet.

https://news.ycombinator.com/item?id=21664692

http://web.archive.org/web/20191128170008/https://medium.com/@derek./how-is-nordvpn-unblocking-disney-6c51045dbc30

Finally, the entire premise of the tokenomics of Orchid comes down to "what can users do if they want random providers--ones that compete at best on fundamental properties like performance and stability--as opposed to on more basal properties of their physical existence?"... to the extent to which a user is actively interested in specific subsets of nodes (with the worst case being "this is the one node I've found that works right now for me to access Disney+") it isn't really using our tech stack and, worse, bypasses our attempts at figuring out the incentive structure for our staking directory :(. The Orchid token supposedly has value because, by staking money in a directory, you get a randomly-selected subset of traffic; this works because the providers in Orchid are supposed to be, by and large, a "commodity product"... as soon as providers are being judged on bespoke functionality--which "being able to access Disney+" absolutely is--the value of that staking mechanism begins to crumble... a user probably should instead attempt to go through the full directory (without a stake-weighted random selection), figure out which providers do what they want, potentially share that information with other people on random websites (or hoard it for themselves: a working region block bypass is something best kept private ;P) and then ignore the value of the stakes.

Thereby, the use cases that I'm really focused on--improving the Internet with a payment network and network protocol designed for deep programmability is to obtain better privacy, performance, and possibility for the user--are all undermined by attempts to simultaneously fight a losing battle (both in terms of practical engineering as well as legal standing) to help users bypass copyright region blocks :(. Now, that isn't to say Orchid fundamentally can't help with some subset of those cases, but I personally think it would be a travesty if it came in the form of "select a location" in the Orchid UI, as that isn't even the direct solution for the problem in a world where that was obviously possible (for some subset of the reasons I already described). Orchid thereby isn't looking as much at figuring out how to defeat region blocking, but how to undermine the market of content providers: instead of figuring out how to defeat websites with network blocks--which ironically includes Wikipedia!--our mission statement delves into "figure out how to replace Wikipedia" (which is how it was phrased to me by one of Orchid's initial big investors back in 2017--I honestly do not remember which one, as it was at a giant dinner party that they were all at--when I had disclosed all of this complexity).

I hope that this makes sense? I'm going to close this issue, as I don't expect this kind of functionality to end up in Orchid any time soon (and don't really want to accept it as a potential task) or possibly ever (though that isn't to say that it is entirely off the table; certainly, as mentioned, specific use cases for this are likely to be supported by various clients and browser extensions that use Orchid, and I could see there being some big shift in the economic model of the directory--or merely a v2 directory-- at some point unlocking new potentials for the existing system).

cl0vrfi3ld commented 3 years ago

Wow ok. Yeah thanks for the response and clarifying on the internals, apologies for my misunderstandings. So basically, under the hood, Orchid is a marketplace for bandwidth, not a VPN itself, and the nodes/providers are solely competing for quality, so throwing in location would throw that all outta wack. The Orchid GUI then uses the Orchid protocol to provide a VPN client-like experience, the app is a privacy shield/performance proxy (if that makes sense?) more than a VPN.