celestiaorg / celestia-node

Celestia Data Availability Nodes
Apache License 2.0
929 stars 925 forks source link

[idea] Consider namespaced nodes #323

Closed Wondertan closed 11 months ago

Wondertan commented 2 years ago

Celestia's blocks are sharded with namespaces in a way that you can access only a specific part of a block over the network from the full node. Similarly, some nodes can only store specific parts of the block and not all the parts for reasons like disk usage optimization. Future rollup sequencers/aggregators might be interesting in this as they won't care about other rollups besides their own. However, this requires our network to have a discovery mechanism for such nodes, so that other client nodes can discover them - "namespaced node" and access a specific part they need. This can be implemented using DHT service discovery and by implementing NamespaceAvailability which will sync only required namespaced shares. Also, there might be functionality for a full node to become a "namespaced node" by pruning "all besides needed" or "set of specific" namespaces.

musalbas commented 2 years ago

I'm not sure if it makes sense to implement per-namespace discovery on the main network. This is more suitable for rollup subnets, where clients can request the rollup data from nodes directly. Implementing per-block discovery is much simpler, and still allows nodes to query the blocks by namespace, once they find a peer with the block height.

In general, on the main network, we want to decouple the data layer from the application layer. This means that all data should be treated as being equal to each other. Adding preferential treatment for different application namespace would not be in line with this.

Wondertan commented 2 years ago

@musalbas, Imagine the two following architectures where there are Celestia and Rollup nodes are attached to each other and maintain a trusted connection over Celestia Node API/RPC for accessing data and other stuff like PayForMsg:

Now, let's compare:

liamsi commented 2 years ago

Our MVP experiments showed us that DHT re/announcing is the most time consuming while discovering is not! In the case of namespaces re/announcing is cheap so this solves the problem we had and thus almost completely solves the partial node problem in a more elegant way. That is, DASing and a discovery over namespaces(as a fallback) is possible.

The assumption here is that you keep the namespace for each height? What happens if you prune old heights though? Retry? Or will the key be a mix of height and namespace?

This works when the default pruning strategy removes all other namespaced data besides needed which in most cases should eliminate the need to prune past blocks by height.

I disagree with that. Not having pruning on the Cosmos Hub was a major pain for node operators. You could easily argue that the block data in the early days of the cosmos hub is comparable to some light weight rollup (back then it only supported gov proposals, staking Txs, and transfers).

Wondertan commented 2 years ago

The assumption here is that you keep the namespace for each height? Or will the key be a mix of height and namespace?

DASing in its current state does not know anything about heights. To make it so we would probably need to throw away the Bitswap and write something ourselves, as it hash based. So the assumption is that there is no heights and only namespaces encoded into cid.

What happens if you prune old heights though? Retry?

We rely on the assumption that there will be at least one node serving a height to the network. Therefore, the will be at least one node serving a namespaced sample of a block for the height. By discovering peers under the namespace, we are zooming into a subset of peers of the whole network, where at least one of them should provide us with the sample. The subset can be small enough and we might instantly connect to the provider or we might need to take some more time to find it if the subnet is bigger.

So to answer your question, if someone prunes by the height it would still be possible to find another one who did not prune.

I disagree with that. Not having pruning on the Cosmos Hub was a major pain for node operators. You could easily argue that the block data in the early days of the cosmos hub is comparable to some light weight rollup (back then it only supported gov proposals, staking Txs, and transfers).

Ok, also, you can still remove past heights. My point is more about rollup nodes wanted to be full nodes for the whole network. They will want this to be able to generate state/msg inclusion fraud proofs for the whole network. Namespacing allows them to do so by only caring there own namespace and mainnet namespace once the whole network becomes super expensive(imagine your example multiplied by thousands) by pruning other uninterested namespaces. Then If your own chain becomes super expensive you can still prune past blocks and be discovered for your namespace and serve back what you have.

So the idea here is not completely remove pruning be the height, but to make pruning and discovery two dimensional(namespace:height).

The following is the order of those being applicable to our network and thus the implementation:

  1. Height pruning
  2. Namespace pruning
  3. Namespace discovery
  4. Height discovery

The reasons why I put height discovery to the last place are:

The namespace discovery on the other hand is much simpler to implement, as the whole stack is aware about namespaces deeply already and for it we don't need to implement our own protocol, so it can be a good middle term solution that is possible to deliver even before mainnet.