Expunge any added "invalid" nodes from k-buckets

joshuakarp commented 3 years ago

Specification

Any node that is added that turns out to be invalid, should be expunged from the any state in the nodes.

We need a means of handling any nodes that are found to be invalid.

I make the initial assumption that an "invalid" keynode can only be introduced by a manual insertion into the buckets database. Therefore, an "invalid" node can only be inserted from the nodes add CLI call, or from the specified seed nodes when we instantiate a keynode. I'm of the opinion that nodes added from discovery (through Kademlia) should be expected to be correct. However, do note that any nodes added by nodes add can be discovered through Kademlia. Thus, we're currently at a state where "invalid" nodes could be discovered.

From my perspective, there's a couple of ways a node can be deemed as "invalid":

invalid host/port: any malformed IP is caught at the CLI level, but an invalid well-formed IP + port can't be ascertained easily (a timeout of connection can signify that the node is simply offline).
invalid node ID: malformed node IDs are caught at CLI level, and we also check the node ID against the certificate on connection.

As such, we don't currently have an easy way of determining an "invalid" node.

I believe the best approach to this is making appropriate restrictions on how/what we add to the buckets database. See additional commentary in the below section regarding this.

Additional context

This was originally brought up in review of the nodes CLI refactoring.

When discussing the nodes add command, I made the initial assumption that we would need to create a connection to the other node before we could successfully add to the database. This would ensure that the node details must be correct.

However, this is not the case https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/198#note_641488589:

I'm not sure about this constraint. PK nodes should be able to be offline at random times, and it's a fairly lazy network. I reckon you should be able to add nodes into the graph without initial connection. The problem with kademlia you mentioned is known in literature. It's called kademlia poisoning. Variants are like sybil attacks as well. We haven't investigated mitigations against it right now, and the kademlia system can be suspect to DDOS attack as well. The problem you say can also occur by just someone subverting its own kademlia database, so the mitigation for this can not be on the assumption that all nodes are friendly. It requires a different approach. But we can address this later after the release. For now, you just want to ensure that when you do the discovery you might want to avoid adding nodes into your search if that node is offline or cannot be accessed (you may try to discard them early) so you don't keep around dead nodes in your node graph. Of course you don't want delete dead nodes if they part of your gestalt though. We will need to go through a proper design overview to work through these issues.