Generate a chart of the "best case" network coverage, based on nodes' claimed radii

carver commented 6 months ago

Stacked area chart, showing network replication (based on the claimed radius of the node)

### Tasks
- [x] Paper protototype of how xor distance plays out in a replication chart
- [x] Generate a stacked area chart in d3 (or maybe just a bar chart), using dummy data
- [x] Calculate chart data, based on node ID & radius data
- [x] Test tradeoffs between high-resolution/low-speed and low-resolution/high-speed display
- [x] Clean up for merge
- [x] Validate against production data
- [x] Generate keycount for every node, for every prefix

carver commented 6 months ago

The keyspace is too large to have an entry for every content ID, of course. So an area/bar chart will necessarily cover a range of content IDs for every bar. The edges of coverage, therefore, would only include part of some data points. I think the right approach here is to only increment the bar by 1 if a client claims to be interested in keys at both extremes of the bar's range. We probably want to under-count replication, when not exactly precise.

This has a couple awkward things to watch out for:

Very small radius nodes will not cover the width of a single bar, so will not impact the replication. (we need to make sure the resolution is high enough, ie~ the width of the bars are small enough, to avoid losing too much information about these nodes)
Because of the non-continuous nature of XOR distance, it's possible that a node covers the edges of a bar, but not some data in the middle. My intuition says this is a small enough problem that we can ignore it, but I am not sure I could convince a skeptic of that. The higher the resolution, the less of a problem that this should be.

So generally, we will want high resolution (narrow bars), as much as we can tolerate performance-wise.

carver commented 6 months ago

Thanks for hinting that fractional calculation of the keyspace is probably straightforward, @morph-dev . It is! I'll implement it that way from the start.

(I ended up writing a little python script to brute-force an 8-bit keyspace with a few different xor-distances, and make sure my closed-form solution was working correctly)

If you're interested, I'm happy to write it up or hop on a call. Otherwise, you'll probably see the prototype in a glados PR in the next couple of days.

morph-dev commented 6 months ago

Great! One other thing that crossed my mind is that while fractional calculation is good, we probably also want to know the number of nodes that fully cover the keyspace/domain. And we can show both information using stacked histogram.

carver commented 6 months ago

Yeah, I think that categorical split is helpful, I added it.

It will also probably be helpful to split by client (though I may hop onto something different, at the moment).

ethereum / glados

Generate a chart of the "best case" network coverage, based on nodes' claimed radii #261