discv5: topic index design thread

fjl commented 4 years ago

In order to register a node for a certain topic, it must place ads for itself on other nodes. In the current version of the spec, the process of registering with a single node is very well described, but distributing ads among registrar nodes is still unsolved.

There are two sub-problems here: (1) is HOW MANY ads a single node should place, (2) is WHERE they should be placed.

For (1), what we need is an algorithm that places ads such that the node can be 'found enough', where the notion of 'enough' is application-specific.

Intuitively, if every registrant node was targeting a fixed number of live ads, the odds of finding any particular node would be the same and the system would stay balanced. Unfortunately, this can be gamed too easily by simply registering on more nodes than everyone else.

Our current best guess for doing this better is to compute the notion of 'found enough' based on peer activity outside of the discovery protocol. We can assume that every participant is able decide whether the current amount of incoming useful peer connections is too high or too low. If the incoming peer connection rate is too low, we can simply increase the number of live ads we target. This can still be gamed, but ensures that nodes will correct for imbalances automatically. The remaining issue here is to find out how this measurement can be implemented in practice. If an application uses multiple topics, it might not be possible to attribute incoming connections to a particular topic, for example.

For problem (2), which is where ads should be placed, we want the algorithm to pick registrars such that the set of considered registrar nodes shrinks and grows according to some estimation of topic size, i.e. popular topics provided by many participants should use a larger subset of the whole network to store ads.

To do this, we decided early on that ad distribution should use Kademlia distance to determine a subset of all nodes, the size of which we called the 'topic radius'. The radius was to be estimated by checking ticket waiting times, but then we found multiple issues (#111, #112) and concluded that using the waiting time is a dead end because it can vary for reasons other than topic popularity.

Instead, it seems that the number of nodes in the topic queue of each registrar is a better indicator. If registrants could obtain this number in some way, they could walk the network for a while to find registrars which have space available and fill it. How exactly this should happen, and whether this approach will converge on a good distribution ads is still an open question.

jannikluhn commented 4 years ago

Our current best guess for doing this better is to compute the notion of 'found enough' based on peer activity outside of the discovery protocol.

This seems very hard to implement in practice. I guess it's possible to notice if peer activity drops significantly over time, but I don't see how one could come up with an absolute estimate, especially for new and small topics.

If the goal is for all network participants to place roughly the same amount of ads, can we simply try to sample this number directly? Pick some random nodes and ask around for ads placed by these nodes. And then place the same amount.

harnen commented 4 years ago

Wouldn't it be simpler if the registrant tried to register on every node (without any radius estimation) during its random walk towards H(t) and let the nodes on the path decide whether to accept it or not?

The decision could be taken based on the node's distance from H(t) (the closer, the higher chance of accepting) and the current number of registration among all the topics (the more registrations, the lower chance of accepting). It would create a balanced system where, if there are not many registrations in the system, nodes would accept any registration, but when the traffic is increasing, each node would start to specialize in topics close to its own ID.

If the registrant finds that it doesn't have incoming connection, it can repeat the process using alternative paths (or repeat on the same one if the registration decision is made nondeterministic).

fjl commented 4 years ago

@harnen This is a very nice idea to try. I'll try to implement this next week to see if there is any downside to that.

yiannisbot commented 4 years ago

The first issue of HOW MANY ads a single node should place sounds to me a bit contradictory to the fact that nodes can advertise any number of topics. Am I missing something here @fjl.

Assuming that at some point nodes will get overloaded, it might make sense for this to be capped somehow. If there is a cap, then how about a max-min fair approach to the number of nodes a topic should be advertised to. According to max-min fairness popular topics will be advertised more, but will not starve un-popular topics. Popularity (or any other metric used) can be obtained by the peer activity outside the discovery protocol, as mentioned earlier in this discussion by @fjl.

Implementing this might be a bit less straightforward, as there will need to be common knowledge of: i) the network size (number of nodes), ii) the number of topics that exist at any given moment, and iii) the popularity (in terms of demand for advertising space) of every topic. Each node can then run the max-min fair algorithm locally, determine its allowance in terms of the number of nodes where it can advertise and then run @harnen’s (or any other) algorithm to place them. The system will also adjust automatically as the three parameters above change. I suspect that even rough estimates of the network size, number of topics and topic popularity should be ok.

fjl commented 4 years ago

The first issue of HOW MANY ads a single node should place sounds to me a bit contradictory to the fact that nodes can advertise any number of topics. Am I missing something here @fjl.

What I meant is 'how many ads per topic'. The way we've been thinking about the registration process here is always per-topic. Nodes would basically run the same process for every topic individually.

fjl commented 4 years ago

here will need to be common knowledge of: i) the network size (number of nodes), ii) the number of topics that exist at any given moment, and iii) the popularity (in terms of demand for advertising space) of every topic

We can't have this because these parameters can change at any time.

fjl commented 4 years ago

I just figured out that we can measure total network size pretty accurately by tracking the avg distance of lookup results from the lookup target.

yiannisbot commented 4 years ago

I just figured out that we can measure total network size pretty accurately by tracking the avg distance of lookup results from the lookup target.

Yes, exactly, there are several different techniques proposed in the literature (see [1] and [2]). And in any case, a relatively close estimate of the size will do in this case I believe.

[1] RFC7363: Self-Tuning DHT for REsource LOcation And Discovery (RELOAD) [2] Measuring Large-Scale Distributed Systems: Case of BitTorrent Mainline DHT

fjl commented 3 years ago

New proposal time! Some of the text below just re-states stuff that's in the spec, the new things in this proposal are:

Radius estimation removed
Implementable descriptions of advertisement and search algorithms
Topic queue traffic control through cumulative wait time tracking

I think the ideas below are a good starting point for simulation. We still need to work out the details (e.g. time constants, bucket sizes, topic table limits) and some edge cases.

Terms

A 'topic' is an identifier for a service provided by a node.
An 'advertiser' is a node providing a service that wants to be found.
An 'ad' is the registration of an advertiser for a topic on another node.
An 'advertisement medium' is a node on which an ad is stored.
A 'searcher' is a node looking for ads for a topic.

Ad Storage & Tickets

In order to place an ad, the advertiser must present a valid ticket to the advertiser.

Tickets are opaque objects issued by the advertisement medium. When the advertiser first tries to place an ad without a ticket, it receives an initial ticket and a 'waiting time' which it needs to spend. The advertiser must come back after the waiting time has elapsed and present the ticket again. When it does come back, it will either place the ad successfully or receive another ticket and waiting time.

While tickets are opaque to advertisers, they are readable by the advertisement medium. The medium uses the ticket to store the cumulative waiting time, which is sum of all waiting times the advertiser spent. Whenever a ticket is presented and a new one issued in response, the cumulative waiting time is increased and carries over into the new ticket.

All nodes act as advertisement media and keep a table of 'topic queues'. This table stores the ads. The table has a limit on the number of ads in the whole table, and also has a limit on the number of ads in each queue. Ads move through the queue at a fixed rate. When the queue is full, and the last ad expires, a new ad can be stored at the beginning of the queue. An advertiser can only have a single ad in the queue, duplicate placements are rejected.

Topic queues are subject to competition. To keep things fair, the advertisement medium prefers tickets which have the longest cumulative waiting time. In addition to the ads, each queue also keeps the current 'best ticket', i.e. the ticket with the longest cumulative waiting time. When a ticket with a better time is submitted, it replaces the current best ticket. Once an ad in the queue expires, the best ticket is admitted into the queue and the node which submitted it is notified.

Tickets cannot be used beyond their lifetime. If the advertiser does not come back after the waiting time, all cumulative waiting time is lost and it needs to start over.

To keep ticket traffic under control, an advertiser requesting a ticket for the first time gets a waiting time equal to the cumulative time of the current best ticket. For a placement attempt with a ticket, the new waiting time is assigned to be the best time minus the cumulative waiting time on the submitted ticket.

Advertiser Algorithm

The above description explains the storage and placement of ads on a single medium, but advertisers need to place ads redundantly on multiple nodes in order to be found.

The advertiser keeps a 'ticket table' to track its ongoing placement attempts. This table is made up of k-buckets of logarithmic distance to the topic hash, i.e. the table stores k advertisement media for every distance step. It is sufficient to use a small value of k such as k = 3. The ticket table is initialized and refreshed by performing lookups for the topic hash using the main node table.

For every node stored in the ticket table, the advertiser attempts to place an ad on the node and keeps the latest ticket issued by that node. It also keeps references to all tickets in a priority queue keyed by the expiry time of the ticket so it can efficiently access the next ticket for which a placement attempt is due.

Nodes/tickets are removed from their ticket table bucket when the ad is placed successfully or the medium goes offline. The removed entry is replaced when the ticket table is refreshed by a lookup.

Search Algorithm

The purpose of placing ads is being discovered by searchers.

Searchers on a topic also keep a table, the 'search table'. Like the 'ticket table', this table also stores k-buckets of advertisement media by distance to the topic hash. The k factor of the search table should be relatively large in order to make the search efficient. Tickets are not required for search. The search table is initialized and refreshed by performing lookups for the topic hash on using the main node table.

To find ads, the searcher simply queries the nodes in the search table for ads. In order to find new results, bucket entries are replaced when the node fails to answer or when it answers with an empty list of ads. Bucket entries of the search table should also be replaced whenever the table is refreshed by a lookup.

How does this deal with topic popularity?

In earlier research, we kept trying to estimate the 'radius' (i.e. popularity) of the topic in order to determine the advertisement media.

I think the proposed advertisement algorithm will track popularity automatically because the cumulative waiting time required for placement just grows the closer you get to the topic hash. Nodes will keep trying to out-wait each other close to the center. Further away from the topic, waiting times will be more reasonable and everyone will be able to find their place there. When the topic shrinks, the required 'best time' will shrink also.

For search, estimation is less necessary and we should try to see how efficient it is in the way specified above. I think it might just work out.

Beyond the simple proposal

There is a minor problem with the simple placement and search scheme outlined above: the nodes which are close to the topic hash will get a lot of traffic because they'll be in everyone's ticket and search tables. We previously tried to eliminate this problem using the concept of 'minimum radius'. It might work to use network density estimation for this. If we have a rough estimate on network size, we can determine a lower bound on the distance to the topic hash such that the number of nodes in the 'center' is > 256, for example.

The log-distance based tables might not be precise enough to accurately track advertisement media. We could use the more precise tree-based k-bucket design (as used in BitTorrent DHT, for example) for these tables.

Another question is how well this system can cope with many different topics at the same time. @harnen's idea to prioritize ads based on the distance of the advertisement medium from the topic could be used for that. The general idea with the topic table system is that there is a global limit on the number of ads that can be stored across the entire DHT. When the limit is reached, the waiting times will just go up. It's a tradeoff. We need to explore this and set the parameters (especially the topic table limits) so the system will be usable with a large number of topics.

ethereum / devp2p