Tech Intro - Githubissues

Shoalsteed / I2P-Secure-Design-Collective

1 stars 0 forks source link

Tech Intro #86

Closed luciewho closed 1 year ago

luciewho commented 3 years ago

Tech Intro.pdf

Not all the text currently fits on the Tech Intro wireframe. See: https://geti2p.net/en/docs/how/tech-intro

Notes from meeting regarding Tech Intro:

Similar Systems section can be moved to Comparisons page
Move Appendix section
Everything from Similar Systems all the way down to the bottom can be removed
Streaming library can go into Tech Docs > App Layer API and Protocols
Application section to be moved into the Application Overview and guide

Shoalsteed commented 3 years ago

I am going to edit things piece by piece so that it is easier for IDK to compare and add or change things as well. We need to ensure that we are calling the software the software , router the router , and decide on if we are calling the network a network or protocol. In this first edit I have tried to make appropriate changes so that everything is not just called "I2P"

The I2P network is a scalable, self organizing, resilient packet switched anonymous network layer, upon which any number of different anonymity or security conscious applications can operate. Each of these applications may make their own anonymity, latency, and throughput tradeoffs without worrying about the proper implementation of a free route mixnet, allowing them to blend their activity with the larger anonymity set of users already running on top of I2P.

The applications that are made available in with the software provide the full range of typical Internet activities. These include, static site hosting, file sharing, email, and a similar structure to DNS called AddressBook. I2P network content is accessible by configuring a browser that supports using a proxy.

Unlike content hosted within networks like Freenet or GNUnet, the services hosted on the I2P network are fully interactive. Traditional web-style search engines, bulletin boards, blogs you can comment on, database driven sites, and bridges to query static systems like Freenet are all possible on the I2P network.

The I2P network takes on the role of the message oriented middleware in its application layer. An application sends data to a cryptographic identifier (a "destination") and the network takes care of making sure it gets there securely and anonymously.

I2P also provides a simple streaming library to allow I2P's anonymous best-effort messages to transfer as reliable, in-order streams, transparently offering a TCP based congestion control algorithm tuned for the high bandwidth delay product of the network. While there have been several simple SOCKS proxies available to tie existing applications into the network, their value has been limited as nearly every application routinely exposes what, in an anonymous context, is sensitive information. The only safe way to go is to fully audit an application to ensure proper operation and to assist in that we provide a series of APIs in various languages which can be used to make the most out of the network.

Shoalsteed commented 3 years ago

The I2P project is an engineering effort to provide a sufficient level of anonymity to those who need it. It has been in active development since early 2001. All of the work done on I2P is open source, with the majority of the code released outright into the public domain, though making use of a few cryptographic routines under BSD-style licenses. The people working on I2P core development do not control what people release client applications under, and there are several GPL'ed applications available (I2PTunnel, susimail, I2PSnark, I2P-Bote, I2Phex and others.). Funding for I2P comes entirely from donations.

Shoalsteed commented 3 years ago

CHANGE "OPERATION"

I2P Network Overview

I2P makes a strict separation between the software participating in the network (a "router") and the anonymous endpoints ("destinations") associated with individual applications. The fact that someone is running I2P is not usually a secret. What is hidden is traffic, as well as what router a particular destination is connected to. When using the I2P software, several local destinations can be seen in the router sidebar. These could be for proxying in to IRC servers, supporting the a webserver ("I2P Site"), or for torrents.

Tunnels Next, we introduce what I2P calls tunnels. A tunnel is a directed path through an explicitly selected list of routers. Layered encryption is used, so each of the routers can only decrypt a single layer. The decrypted information contains the IP of the next router, along with the encrypted information to be forwarded. Each tunnel has a starting point (the first router, also known as "gateway") and an end point. Messages can be sent only in one way. To send messages back, another tunnel is required.

INFOGRAPHIC

Two types of tunnels exist: "outbound" tunnels send messages away from the tunnel creator, while "inbound" tunnels bring messages to the tunnel creator. Combining these two tunnels allows users to send messages to each other. The sender ("Alice" in the above image) sets up an outbound tunnel, while the receiver ("Bob" in the above image) creates an inbound tunnel. The gateway of an inbound tunnel can receive messages from any other user and will send them on until the endpoint ("Bob"). The endpoint of the outbound tunnel will need to send the message on to the gateway of the inbound tunnel. To do this, the sender ("Alice") adds instructions to her encrypted message. Once the endpoint of the outbound tunnel decrypts the message, it will have instructions to forward the message to the correct inbound gateway (the gateway to "Bob").

Shoalsteed commented 3 years ago

I2P network database ( netDb )

I2P's "network database" (or "netDb") is a pair of algorithms used to share network metadata. The two types of metadata carried are "routerInfo" and "leaseSets."

The routerInfo gives routers the data necessary for contacting a particular router (their public keys, transport addresses, etc). The leaseSet gives routers the information necessary for contacting a particular destination. A leaseSet contains a number of "leases". Each of this leases specifies a tunnel gateway, which allows reaching a specific destination. The full information contained in a lease is:

Inbound gateway for a tunnel that allows reaching a specific destination. Time when a tunnel expires. Pair of public keys to be able to encrypt messages (to send through the tunnel and reach the destination). Routers themselves send their routerInfo to the netDb directly, while leaseSets are sent through outbound tunnels (leaseSets need to be sent anonymously, to avoid correlating a router with his leaseSets).

We can combine the above concepts to build successful connections in the network.

To build up her own inbound and outbound tunnels, Alice does a lookup in the netDb to collect routerInfo. This way, she gathers lists of peers she can use as hops in her tunnels. She can then send a build message to the first hop, requesting the construction of a tunnel and asking that router to send the construction message onward, until the tunnel has been constructed.

INFOGRAPHIC

When Alice wants to send a message to Bob, she first does a lookup in the netDb to find Bob's leaseSet, giving her his current inbound tunnel gateways. She then picks one of her outbound tunnels and sends the message down it with instructions for the outbound tunnel's endpoint to forward the message on to one of Bob's inbound tunnel gateways. When the outbound tunnel endpoint receives those instructions, it forwards the message as requested, and when Bob's inbound tunnel gateway receives it, it is forwarded down the tunnel to Bob's router. If Alice wants Bob to be able to reply to the message, she needs to transmit her own destination explicitly as part of the message itself. This can be done by introducing a higher-level layer, which is done in the streaming library. Alice may also cut down on the response time by bundling her most recent LeaseSet with the message so that Bob doesn't need to do a netDb lookup for it when he wants to reply, but this is optional.

INFOGRAPHIC

While the tunnels themselves have layered encryption to prevent unauthorized disclosure to peers inside the network (as the transport layer itself does to prevent unauthorized disclosure to peers outside the network), it is necessary to add an additional end to end layer of encryption to hide the message from the outbound tunnel endpoint and the inbound tunnel gateway. This "garlic encryption" lets Alice's router wrap up multiple messages into a single "garlic message", encrypted to a particular public key so that intermediary peers cannot determine either how many messages are within the garlic, what those messages say, or where those individual cloves are destined. For typical end to end communication between Alice and Bob, the garlic will be encrypted to the public key published in Bob's leaseSet, allowing the message to be encrypted without giving out the public key to Bob's own router.

Another important fact to keep in mind is that I2P is entirely message based and that some messages may be lost along the way. Applications using I2P can use the message oriented interfaces and take care of their own congestion control and reliability needs, but most would be best served by reusing the provided streaming library to view I2P as a streams based network.

Shoalsteed commented 3 years ago

How Tunnels Work

Both inbound and outbound tunnels work along similar principles. The tunnel gateway accumulates a number of tunnel messages, eventually preprocessing them into something for tunnel delivery. Next, the gateway encrypts that preprocessed data and forwards it to the first hop. That peer and subsequent tunnel participants add on a layer of encryption after verifying that it isn't a duplicate before forward it on to the next peer. Eventually, the message arrives at the endpoint where the messages are split out again and forwarded on as requested. The difference arises in what the tunnel's creator does - for inbound tunnels, the creator is the endpoint and they simply decrypt all of the layers added, while for outbound tunnels, the creator is the gateway and they pre-decrypt all of the layers so that after all of the layers of per-hop encryption are added, the message arrives in the clear at the tunnel endpoint.

The choice of specific peers to pass on messages as well as their particular ordering is important to understanding both I2P's anonymity and performance characteristics. While the network database (below) has its own criteria for picking what peers to query and store entries on, tunnel creators may use any peers in the network in any order (and even any number of times) in a single tunnel. If perfect latency and capacity data were globally known, selection and ordering would be driven by the particular needs of the client in tandem with their threat model. Unfortunately, latency and capacity data is not trivial to gather anonymously, and depending upon untrusted peers to provide this information has its own serious anonymity implications.

From an anonymity perspective, the simplest technique would be to pick peers randomly from the entire network, order them randomly and use those peers in that order for all eternity. From a performance perspective, the simplest technique would be to pick the fastest peers with the necessary spare capacity, spreading the load across different peers to handle transparent failover, and to rebuild the tunnel whenever capacity information changes. While the former is both brittle and inefficient, the later requires inaccessible information and offers insufficient anonymity. I2P is instead working on offering a range of peer selection strategies, coupled with anonymity aware measurement code to organize the peers by their profiles.

As a base, I2P is constantly profiling the peers with which it interacts with by measuring their indirect behavior - for instance, when a peer responds to a netDb lookup in 1.3 seconds, that round trip latency is recorded in the profiles for all of the routers involved in the two tunnels (inbound and outbound) through which the request and response passed, as well as the queried peer's profile. Direct measurement, such as transport layer latency or congestion, is not used as part of the profile, as it can be manipulated and associated with the measuring router, exposing them to trivial attacks. While gathering these profiles, a series of calculations are run on each to summarize its performance - its latency, capacity to handle lots of activity, whether they are currently overloaded, and how well integrated into the network they seem to be. These calculations are then compared for active peers to organize the routers into four tiers - fast and high capacity, high capacity, not failing, and failing. The thresholds for those tiers are determined dynamically, and while they currently use fairly simple algorithms, alternatives exist.

Using this profile data, the simplest reasonable peer selection strategy is to pick peers randomly from the top tier (fast and high capacity), and this is currently deployed for client tunnels. Exploratory tunnels (used for netDb and tunnel management) pick peers randomly from the "not failing" tier (which includes routers in 'better' tiers as well), allowing the peer to sample routers more widely, in effect optimizing the peer selection through randomized hill climbing. These strategies alone do however leak information regarding the peers in the router's top tier through predecessor and netDb harvesting attacks. In turn, several alternatives exist which, while not balancing the load as evenly, will address the attacks mounted by particular classes of adversaries.

By picking a random key and ordering the peers according to their XOR distance from it, the information leaked is reduced in predecessor and harvesting attacks according to the peers' failure rate and the tier's churn. Another simple strategy for dealing with netDb harvesting attacks is to simply fix the inbound tunnel gateway(s) yet randomize the peers further on in the tunnels. To deal with predecessor attacks for adversaries which the client contacts, the outbound tunnel endpoints would also remain fixed. The selection of which peer to fix on the most exposed point would of course need to have a limit to the duration, as all peers fail eventually, so it could either be reactively adjusted or proactively avoided to mimic a measured mean time between failures of other routers. These two strategies can in turn be combined, using a fixed exposed peer and an XOR based ordering within the tunnels themselves. A more rigid strategy would fix the exact peers and ordering of a potential tunnel, only using individual peers if all of them agree to participate in the same way each time. This varies from the XOR based ordering in that the predecessor and successor of each peer is always the same, while the XOR only makes sure their order doesn't change.

A more detailed discussion of the mechanics involved in tunnel operation, management, and peer selection can be found in the tunnel spec.

Shoalsteed commented 3 years ago

The I2P Network Database

I2P's netDb works to share the network's metadata. This is detailed in the network database page, but a basic explanation is available below.

A percentage of I2P users are appointed as 'floodfill peers'. Currently, I2P installations that have a lot of bandwidth and are fast enough, will appoint themselves as floodfill as soon as the number of existing floodfill routers drops too low.

Other I2P routers will store their data and lookup data by sending simple 'store' and 'lookup' queries to the floodfills. If a floodfill router receives a 'store' query, it will spread the information to other floodfill routers using the Kademlia algorithm. The 'lookup' queries currently function differently, to avoid an important security issue. When a lookup is done, the floodfill router will not forward the lookup to other peers, but will always answer by itself (if it has the requested data).

Two types of information are stored in the network database.

A RouterInfo stores information on a specific I2P router and how to contact it A LeaseSet stores information on a specific destination (e.g. I2P website, e-mail server...) All of this information is signed by the publishing party and verified by any I2P router using or storing the information. In addition, the data contains timing information, to avoid storage of old entries and possible attacks. This is also why I2P bundles the necessary code for maintaining the correct time, occasionally querying some SNTP servers (the pool.ntp.org round robin by default) and detecting skew between routers at the transport layer.

Additional Information

Unpublished and encrypted leasesets: One could only want specific people to be able to reach a destination. This is possible by not publishing the destination in the netDb. You will however have to transmit the destination by other means. An alternative are the 'encrypted leaseSets'. These leaseSets can only be decoded by people with access to the decryption key.

Bootstrapping: Bootstrapping the netDb is quite simple. Once a router manages to receive a single routerInfo of a reachable peer, it can query that router for references to other routers in the network. Currently, a number of users post their routerInfo files to a website to make this information available. I2P automatically connects to one of these websites to gather routerInfo files and bootstrap.

Lookup scalability: Lookups in the I2P network are not forwarded to other netDb routers. Currently, this is not a major problem, since the network is not very large. However, as the network grows, not all routerInfo and leaseSet files will be present on each netDb router. This will cause a deterioration of the percentage of successful lookups. Because of this, refinements to the netDb will be done in the next releases.

Shoalsteed commented 3 years ago

Transport Protocols

Communication between routers needs to provide confidentiality and integrity against external adversaries while authenticating that the router contacted is the one who should receive a given message. The particulars of how routers communicate with other routers aren't critical - three separate protocols have been used at different points to provide those bare necessities.

I2P started with a TCP-based protocol which has since been disabled. Then, to accommodate the need for high degree communication (as a number of routers will end up speaking with many others), I2P moved from a TCP based transport to a UDP-based one - "Secure Semireliable UDP", or "SSU".

As described in the SSU spec:

The goal of this protocol is to provide secure, authenticated, semireliable and unordered message delivery, exposing only a minimal amount of data easily discernible to third parties. It should support high degree communication as well as TCP-friendly congestion control and may include PMTU detection. It should be capable of efficiently moving bulk data at rates sufficient for home users. In addition, it should support techniques for addressing network obstacles, like most NATs or firewalls. Following the introduction of SSU, after issues with congestion collapse appeared, a new NIO-based TCP transport called NTCP was implemented. It is enabled by default for outbound connections only. Those who configure their NAT/firewall to allow inbound connections and specify the external host and port (dyndns/etc is ok) on /config.jsp can receive inbound connections. As NTCP is NIO based, so it doesn't suffer from the 1 thread per connection issues of the old TCP transport.

I2P supports multiple transports simultaneously. A particular transport for an outbound connection is selected with "bids". Each transport bids for the connection and the relative value of these bids assigns the priority. Transports may reply with different bids, depending on whether there is already an established connection to the peer.

The current implementation ranks NTCP as the highest-priority transport for outbound connections in most situations. SSU is enabled for both outbound and inbound connections. Your firewall and your I2P router must be configured to allow inbound NTCP connections. For further information see the NTCP page.

Shoalsteed commented 3 years ago

Cryptography

A bare minimum set of cryptographic primitives are combined together to provide I2P's layered defenses against a variety of adversaries. At the lowest level, inter router communication is protected by the transport layer security - SSU encrypts each packet with AES256/CBC with both an explicit IV and MAC (HMAC-MD5-128) after agreeing upon an ephemeral session key through a 2048bit Diffie-Hellman exchange, station-to-station authentication with the other router's DSA key, plus each network message has their own hash for local integrity checking. Tunnel messages passed over the transports have their own layered AES256/CBC encryption with an explicit IV and verified at the tunnel endpoint with an additional SHA256 hash. Various other messages are passed along inside "garlic messages", which are encrypted with ElGamal/AES+SessionTags (explained below).

Garlic messages Garlic messages are an extension of "onion" layered encryption, allowing the contents of a single message to contain multiple "cloves" - fully formed messages alongside their own instructions for delivery. Messages are wrapped into a garlic message whenever the message would otherwise be passing in cleartext through a peer who should not have access to the information - for instance, when a router wants to ask another router to participate in a tunnel, they wrap the request inside a garlic, encrypt that garlic to the receiving router's 2048bit ElGamal public key, and forward it through a tunnel. Another example is when a client wants to send a message to a destination - the sender's router will wrap up that data message (alongside some other messages) into a garlic, encrypt that garlic to the 2048bit ElGamal public key published in the recipient's leaseSet, and forward it through the appropriate tunnels.

The "instructions" attached to each clove inside the encryption layer includes the ability to request that the clove be forwarded locally, to a remote router, or to a remote tunnel on a remote router. There are fields in those instructions allowing a peer to request that the delivery be delayed until a certain time or condition has been met, though they won't be honored until the nontrivial delays are deployed. It is possible to explicitly route garlic messages any number of hops without building tunnels, or even to reroute tunnel messages by wrapping them in garlic messages and forwarding them a number of hops prior to delivering them to the next hop in the tunnel, but those techniques are not currently used in the existing implementation.

Session tags As an unreliable, unordered, message based system, I2P uses a simple combination of asymmetric and symmetric encryption algorithms to provide data confidentiality and integrity to garlic messages. As a whole, the combination is referred to as ElGamal/AES+SessionTags, but that is an excessively verbose way to describe the simple use of 2048bit ElGamal, AES256, SHA256 and 32 byte nonces.

The first time a router wants to encrypt a garlic message to another router, they encrypt the keying material for an AES256 session key with ElGamal and append the AES256/CBC encrypted payload after that encrypted ElGamal block. In addition to the encrypted payload, the AES encrypted section contains the payload length, the SHA256 hash of the unencrypted payload, as well as a number of "session tags" - random 32 byte nonces. The next time the sender wants to encrypt a garlic message to another router, rather than ElGamal encrypt a new session key they simply pick one of the previously delivered session tags and AES encrypt the payload like before, using the session key used with that session tag, prepended with the session tag itself. When a router receives a garlic encrypted message, they check the first 32 bytes to see if it matches an available session tag - if it does, they simply AES decrypt the message, but if it does not, they ElGamal decrypt the first block.

Each session tag can be used only once so as to prevent internal adversaries from unnecessarily correlating different messages as being between the same routers. The sender of an ElGamal/AES+SessionTag encrypted message chooses when and how many tags to deliver, prestocking the recipient with enough tags to cover a volley of messages. Garlic messages may detect the successful tag delivery by bundling a small additional message as a clove (a "delivery status message") - when the garlic message arrives at the intended recipient and is decrypted successfully, this small delivery status message is one of the cloves exposed and has instructions for the recipient to send the clove back to the original sender (through an inbound tunnel, of course). When the original sender receives this delivery status message, they know that the session tags bundled in the garlic message were successfully delivered.

Session tags themselves have a very short lifetime, after which they are discarded if not used. In addition, the quantity stored for each key is limited, as are the number of keys themselves - if too many arrive, either new or old messages may be dropped. The sender keeps track whether messages using session tags are getting through, and if there isn't sufficient communication it may drop the ones previously assumed to be properly delivered, reverting back to the full expensive ElGamal encryption.

One alternative is to transmit only a single session tag, and from that, seed a deterministic PRNG for determining what tags to use or expect. By keeping this PRNG roughly synchronized between the sender and recipient (the recipient precomputes a window of the next e.g. 50 tags), the overhead of periodically bundling a large number of tags is removed, allowing more options in the space/time tradeoff, and perhaps reducing the number of ElGamal encryptions necessary. However, it would depend upon the strength of the PRNG to provide the necessary cover against internal adversaries, though perhaps by limiting the amount of times each PRNG is used, any weaknesses can be minimized. At the moment, there are no immediate plans to move towards these synchronized PRNGs.

Shoalsteed commented 3 years ago

I would remove all text from the page beginning with "Future" Instead, if we wish too, links to the Roadmap and to Proposals and Git can be placed at the end of the page.

Shoalsteed commented 3 years ago

Application Layer

I2P ?????? ( which part )itself doesn't really do much - it simply sends messages to remote destinations and receives messages targeting local destinations . By itself, I2P could be seen as an anonymous and secure IP layer, and the bundled streaming library as an implementation of an anonymous and secure TCP layer on top of it. Beyond that, I2PTunnel exposes a generic TCP proxying system for either getting into or out of the I2P network, plus a variety of network applications provide further functionality for end users.

Streaming library The I2P streaming library can be viewed as a generic streaming interface (mirroring TCP sockets), and the implementation supports a sliding window protocol with several optimizations, to take into account the high delay over I2P. Individual streams may adjust the maximum packet size and other options, though the default of 4KB compressed seems a reasonable tradeoff between the bandwidth costs of retransmitting lost messages and the latency of multiple messages.

In addition, in consideration of the relatively high cost of subsequent messages, the streaming library's protocol for scheduling and delivering messages has been optimized to allow individual messages passed to contain as much information as is available. For instance, a small HTTP transaction proxied through the streaming library can be completed in a single round trip - the first message bundles a SYN, FIN and the small payload (an HTTP request typically fits) and the reply bundles the SYN, FIN, ACK and the small payload (many HTTP responses fit). While an additional ACK must be transmitted to tell the HTTP server that the SYN/FIN/ACK has been received, the local HTTP proxy can deliver the full response to the browser immediately.

On the whole, however, the streaming library bears much resemblance to an abstraction of TCP, with its sliding windows, congestion control algorithms (both slow start and congestion avoidance), and general packet behavior (ACK, SYN, FIN, RST, etc).

Naming library and address book For more information see the Naming and Address Book page.

Developed by: mihi, Ragnarok

Naming within I2P has been an oft-debated topic since the very beginning with advocates across the spectrum of possibilities. However, given I2P's inherent demand for secure communication and decentralized operation, the traditional DNS-style naming system is clearly out, as are "majority rules" voting systems. Instead, I2P ships with a generic naming library and a base implementation designed to work off a local name to destination mapping, as well as an optional add-on application called the "Address Book". The address book is a web-of-trust-driven secure, distributed, and human readable naming system, sacrificing only the call for all human readable names to be globally unique by mandating only local uniqueness. While all messages in I2P are cryptographically addressed by their destination, different people can have local address book entries for "Alice" which refer to different destinations. People can still discover new names by importing published address books of peers specified in their web of trust, by adding in the entries provided through a third party, or (if some people organize a series of published address books using a first come first serve registration system) people can choose to treat these address books as name servers, emulating traditional DNS.

I2P does not promote the use of DNS-like services though, as the damage done by hijacking a site can be tremendous - and insecure destinations have no value. DNSsec itself still falls back on registrars and certificate authorities, while with I2P, requests sent to a destination cannot be intercepted or the reply spoofed, as they are encrypted to the destination's public keys, and a destination itself is just a pair of public keys and a certificate. DNS-style systems on the other hand allow any of the name servers on the lookup path to mount simple denial of service and spoofing attacks. Adding on a certificate authenticating the responses as signed by some centralized certificate authority would address many of the hostile nameserver issues but would leave open replay attacks as well as hostile certificate authority attacks.

Voting style naming is dangerous as well, especially given the effectiveness of Sybil attacks in anonymous systems - the attacker can simply create an arbitrarily high number of peers and "vote" with each to take over a given name. Proof-of-work methods can be used to make identity non-free, but as the network grows the load required to contact everyone to conduct online voting is implausible, or if the full network is not queried, different sets of answers may be reachable.

As with the Internet however, I2P is keeping the design and operation of a naming system out of the (IP-like) communication layer. The bundled naming library includes a simple service provider interface which alternate naming systems can plug into, allowing end users to drive what sort of naming tradeoffs they prefer.

Shoalsteed commented 3 years ago

The references to I2P tunnel, mail, etc : move these to the SOFTWARE Guide if they are included in the software.

Shoalsteed commented 3 years ago

Tech intro should be limited to Network, Protocol, Application layer, how they interact and create privacy/ anonymized traffic.

eyedeekay commented 2 years ago

Bumping so github will send me a notification. Will review soon.

Shoalsteed commented 1 year ago

updated