Handle security concerns

Schmavery commented 7 years ago

There are a number of security concerns that might arise over the course of this project. So far there seem to be at least a couple of known attacks on a DHT. One is a Sybil attack, which I have yet to fully understand, but involves attackers creating multiple fake identities to "outvote" regular users in various consensus-based activities (spam reports, reputation systems, consensus algorithms).

The other is an ID mapping attack, whereby an attacker picks their own ID in the DHT. This allows the attacker to gain control over certain targeted resources that may be near their node. In order to combat this, I believe it's necessary to force the attacker to pick an ID that is out of their control. This id needs to be externally verifiable by other peers so that the attacker can't just lie about their ID. Other peers need to be able to identify another node which is using a false ID and disconnect from it.

Initially I had imagined using a hash of the node's IP as an ID. This is a solution proposed by a number of papers I've read. In this scenario, other nodes can see the attacker's IP address and verify the attacker's ID matches. However, I'm not sure that this is a viable solution when using webrtc, as webrtc communicates using ICE candidates. The following idea was to consider using a hash of the peer's public key as an ID. This was rapidly shot down by the ID Mapping Attack paper below.

Reading material:

ID Mapping Attacks in P2P Networks http://www.davidecerri.it/wp-content/uploads/2016/02/art-id_mapping_attacks-globecom05.pdf

This explains an attack on a DHT with an unconstrained ID selection mechanism (such as Kademlia, which this paper is based on). It considers the option of using the hash of a public key as an ID as an attempt to defend against ID mapping attacks. It provides fairly convincing evidence that this isn't secure (read the first paper below for more info, it's pretty straightforward). The proposed solution was to constrain the ID by having it be a hash of the IP and port number of the connection.
S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing http://www.tm.uka.de/doc/SKademlia_2007.pdf

This proposes a solution similar (slightly more complex) to my solution below. The paper admits that this isn't actually a good solution (quoting "Secure routing for structured peer-to-peer overlay networks"[4]), but claims that it is the best that we have.
SybilLimit: A Near-Optimal Social Network Defense Against Sybil Attacks http://www.pittsburgh.intel-research.net/projects/sybil-defenses/p885-yu.pdf
Secure routing for structured peer-to-peer overlay networks https://pdfs.semanticscholar.org/1cfa/21fd43a8154bbb0e01acf7d52d520749444d.pdf
A DDoS Attack by Flooding Normal Control Messages in Kad P2P Networks http://www.icact.org/upload/2012/0156/20120156_finalpaper.pdf
Misusing Kademlia Protocol to Perform DDoS Attacks https://www.researchgate.net/publication/224362886_Misusing_Kademlia_Protocol_to_Perform_DDoS_Attacks

Schmavery commented 7 years ago

I'm revising my original assertion that you can't use public IP as an identifier after doing some more reading about networks/p2p connections/routers. It seems it would be possible for the user to select which ice-candidate he would like to use to connect (as long as it's available), and in this way, we could always pick the public IP. However, sometimes (in the case of a peer behind a symmetric NAT), using a public IP is not a viable means of contacting an individual (and even more, it would not remain constant over multiple connections so would be useless as an identifier anyway).

Note: Interestingly, in the absence of this ID-mapping problem, it seems we can fairly trivially avoid needing to have TURN servers by just allowing 2 peers (who are both stuck behind symmetric NATs) to relay messages through a intermediary 3rd peer. This is apparently not an uncommon trick.

The takeaway from this point is that we could technically use a IP-based ID-restriction scheme for a subset of peers

Now we only have a small number of peers who are excluded from the system (those behind a symmetric NAT whose ID we can't verify). It seems to be a reasonable tradeoff to make their experience slightly worse in exchange for being able to still connect securely to the system. It may be possible to make ID generation in this case a comparatively slow process, so as to create a proof of work system, where peers can verify their validity even behind a symmetric NAT. You can imagine a scheme that works as follows (adapted from paper [2]):
- Generate a key pair <pri, pub>
- Take an agreed-upon hash function H
- Find a nonce n (arbitrary value) such that H(pub + n) has sufficient leading zeros to make it hard to find the correct nonce.
- Now the H(pub + n + n) (or something similar) is your ID. (You can't use the hash of the public key alone because then the attacker could generate a keypair that they wanted, then do the work to find a nonce once.)

When using a ID verified by proof of work, you always send along your public key and nonce along with your requests as proof of your ID. Then anyone wanting to verify your ID can redo your hash calculations to verify that your nonce is valid, then send you a request encrypted with your public key which you return decrypted to prove you actually own that key.

IP Spoofing:

There is some question as to how IP spoofing works and whether that would be an attack that would work on a system that relies on you not having very much control over your IP

Update: we should be safe from this: https://security.stackexchange.com/questions/55279/how-easy-is-it-really-to-do-ip-spoofing

Outstanding concerns:

According to "ID Mapping Attacks in P2P Networks"[1], the number of ID-generation attempts needed to gain control of a particular resource scales linearly with the number of nodes and the replication factor (a constant). It is unclear to me whether a proof of work scales sufficiently difficultly in this situation, as most of the cases I've seen it used have very large search spaces.

We want key generation to be fast enough that during account creation it can be done quickly enough (1 min max ideally), but then it will still take only on the order of n minutes (n = # active nodes) to find a good attacking nonce. Doesn't seem great. Potential workaround would be to perform ID rotation periodically as suggested by paper [1] , though this is a large inefficiency and it's unclear whether we could even rotate fast enough to work (especially since this brute-forcing can be easily parallelized).

Also DDoS concerns. Haven't read as much about them.

Schmavery commented 6 years ago

An idea we had a couple months ago was to only allow nodes that are t+ days old store data, and then rotate the positions of data every t days (or months). If the rotation happened predictably (based on date, for instance), this wouldn't help, but if a random number was somehow chosen by consensus of some sort, then maybe this would make it harder to target a given node to hijack it. Of course, it's still not hard to make nodes so someone could just make more nodes for themselves than exist on the network and wait for t, at which point they can hijack any data they want with high likelihood.

Ideally, instead of time, we let you help with the replication of data but make sure that each piece of data is on at least one "trusted" node. As you store more and more data for the network, you become more "trusted"? Then it will cost the attacker money to operate "trusted" nodes. Something like this seems to be the most likely to work right and there is also research that has gone into this sort of thing:

https://en.wikipedia.org/wiki/Proof-of-space
https://bitcointalk.org/index.php?topic=310323.0 (Proof of storage)

Schmavery / raccoon