DavidKeller / kademlia

Dead simple C++11 kademlia distributed hash table library
http://dev.litchis.fr/projects/kademlia
70 stars 17 forks source link

Routing Table maintenance #10

Open aleks-f opened 3 years ago

aleks-f commented 3 years ago

I have some questions related the routing table:

  1. There's a lot of messages exchanged on initial peer connection to the bootstrap, and multiple entries for the peer are inserted in the routing table. Is this necessary or is one entry per peer enough?

  2. Since every peer generates a new ID for itself on every run, when a peer with different ID but the same IP:PORT as an existing routing table entry (ie. a restarted peer) shows up, shouldn't the previous entry for that IP:PORT be purged? Otherwise, the table will grow infinitely.

  3. Stale peers should obviously be purged at some point. Since this is UDP, the only time we know a peer exists is when it broadcasts a new entry. Is a peer required/expected to ping all peers at certain interval to prevent its purging from the routing tables of its peers?

Thanks!

DavidKeller commented 3 years ago

Hello Aleksandar,

1/ From the paper,

To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes's k-buckets as necessary.

My understand of it is that messages have to be sent in order to be known on the network. Each peer should have a single entry within the routing table, because of the routing_table::push() check for uniqueness. Have you seen duplicates when debugging ?

2/ Yes, when a response is received with a random \<request id> to an \<old node id>, the response_router should not only check that the response's \<request id> has been previously generated but also ensure the responder matches the \<old node id> and remove the \<old node id> from the routing_table if not ! (the \<new node id> will be already be inserted by the engine::handle_new_message()).

3/ My understanding of the paper is that peer are lazy purged. When a lookup is performed, peers are contacted, peers that fail to respond in a timely manner are purged at this moment. This is not implemented ATM but isn't a big deal. See this TODO .

I believe the sweet spot would be lookup_task::flag_candidate_as_invalid(). A reference to the routing_table must be added as a lookup_task member, and routing_table::remove() called

aleks-f commented 3 years ago

Hi David, thanks for answering.

Please see my replies below.

1/ ... Have you seen duplicates when debugging ?

Yes, since the IDs sent are not the real peer ID, and only IDs are checked to be unique (and not IP:PORT entries), the result is that the ID uniqueness check finds no duplicates during the discovery process (but then, again - what sense would it make to send multiple real peer IDs anyway); so, bootstrap ends up with a bunch of peers with different (non-existent) IDs, but with same IP:PORT. I could be missing something important but I can't make sense of the discovery process logic.

2/ Yes, when a response is received with a random to an , the response_router should not only check that the response's has been previously generated but also ensure the responder matches the and remove the from the routing_table if not ! (the will be already be inserted by the engine::handle_new_message()).

The only thing I see is check for the non-existent message ID. Duplicate nonexistent peer IDs remain in the bootstrap routing table and keep propagating back to the peer that faked them (and to other peers) - every save/load adds duplicate peers to peers' routing tables.

3/ My understanding of the paper is that peer are lazy purged. When a lookup is performed, peers are contacted, peers that fail to respond in a timely manner are purged at this moment. This is not implemented ATM but isn't a big deal.

OK, makes sense.

I believe the sweet spot would be [lookup_task::flag_candidate_as_invalid()]

Speaking generally, I think I understand the purpose of this, but before we go there, I'd like to first clarify and properly understand the neighbor discovery on boot.

DavidKeller commented 3 years ago

1/ There is two kind of IDs involved:

Each time a peer p contacts a node n (request & response), n store the p Peer ID within its routing table

The purpose of refresh_id[ i ] = ! refresh_id[ i ]; id to generate at each iteration Key IDs further away from our Node IDs to contact nodes that belong to different each bucket (and fill all of them).

2/ Yes, this is something I should implement. That's clearly a bug because it makes the library kind of memleak.

*/ On a side note, I was wondering if rewriting the library to play with C++20 coroutine would be fun and could simplify the code. Would it be a problem for you to require a C++20 compiler ?

aleks-f commented 2 years ago

The purpose of refresh_id[ i ] = ! refresh_id[ i ]; id to generate at each iteration Key IDs further away from our Node IDs to contact nodes that belong to different each bucket (and fill all of them).

OK, I understand this, I was mixing peer and key IDs

2/ Yes, this is something I should implement. That's clearly a bug because it makes the library kind of memleak.

Implementing this is simple. Implementing it efficiently, not so much without <endpoint, bucket_index> caching ; proactively remove existing IP:PORT entries I think makes sense, because if a peer with existing IP:PORT shows up, the previous one must be gone. To do it efficiently with the existing structure, pairs of <endpoint, bucket index> should be cached, otherwise, one has to traverse the entire routing table in order to find duplicates (which most of the time wont exist). Which brings me to another question: why is routing table implemented around vector<list<pair<id, endpoint>>>. Shouldn't routing table be a binary tree?

*/ On a side note, I was wondering if rewriting the library to play with C++20 coroutine would be fun and could simplify the code. Would it be a problem for you to require a C++20 compiler ?

Coroutines would definitely make sense. However, at this point I'm working on an entirely divergent port of your library, completely decoupled from boost and asio, using POCO and gtest instead. In case you want to join that effort, let me know, and we can explore the coroutines possibility there (but it really depends on the folks requesting this port and how they feel about coroutines and the compiler support thereof)