Analysis about big node radius configurations

There is a widely-held intuition that nodes with a huge radius are unhelpful on the network, because no one will ask them for data that is far from their node ID. I would love to generate data to confirm or refute the intuition. If we confirm the intuition, it should also help give us numbers about how big the simulated smaller radii should be.

The first analysis that comes to mind is this:

Inspect a query trace, looking at the first round of nodes that are asked for content. Grab the node with the furthest xor distance in that first round of queries, and save the distance. Across all traces in a network, track the 95th percentile smallest distance of these long distances. That should give us a sense of how far away from the content ID we are reliably looking.

Why only look at the first round of content queries? If we have a hard time finding the content, we will keep exhaustively searching. So it's less interesting if we eventually find the big radius nodes, which we all expect to. It's more interesting if big radius nodes could serve the content quickly, by being queried early. If they aren't asked in the first round, they are even less likely to be asked in a later round. (Unless the data is nearly missing from the network, which is an uninteresting case for us)

There might be a simpler version of this where we simply create a graph that shows the following data over the course of the last million audits.

X axis: logarithmic distance from serving node-id to content-id
Y axis: number of audits

I think that this graph would be informative in the current network but is less interesting because:

Our R-value is low, meaning that there are only 1-3 nodes storing any individual piece of information
Our variance on distance is low because most of our nodes are storing the same amount.

Increases to network R-value and increases to variability of node storage radius in network would increase the quality of the data from this graph. My intuition is that this graph will have a shape which shows that a majority of served requests are within some reasonably small distance that correlates to the mean/average radius exibited by nodes on the network. I think there is a testable hypothesis in here somewhere.

ethereum / glados

Analysis about big node radius configurations #337