Split DSN mapping RPC into a piece index subscription and full mapping method

teor2345 commented 1 week ago

This PR makes large volumes of DSN mappings easier to retrieve, by:

changing the subscription to return a list of piece indexes that contain mappings
adding a method that takes a piece index range, and returns all mappings in that range

To implement this method, the PR also adds a mapping cache, which uses around 44 bytes per cached mapping. We can make the cache limits user adjustable in a future PR, and set better defaults there.

Code contributor checklist:

[x] I have read, understood and followed contributing guide

teor2345 commented 1 week ago

I understand what is being done and I read Slack message, but I don't understand what problem for end-user/developer it tries to address.

The use cases I foresee are the following:

Indexer that simply stored everything, in which case this API only makes it more difficult to use

An app that is waiting for a particular piece of data to be "confirmed" by archiving, which this change doesn't help with and instead makes more difficult as well

I agree that these are the main use cases. And I agree that these are drawbacks, particularly if there is an easier way to implement backpressure or rate-limiting on the server side.

(This is great feedback, next time what’s a good way to get it before I write the code? I posted a Slack message with this design, I’m not sure if you saw it last week?)

For the first issue I think some kind of backpressure is needed on RPC layer to ensure none of the mappings are lost and subscription doesn't break.

I don’t know how to implement this using the jsonrpsee APIs. It seems like it ignores tower service backpressure: https://docs.rs/jsonrpsee-server/0.24.3/src/jsonrpsee_server/server.rs.html#991 and it doesn’t have a backpressure implementation of its own.

It’s also not compatible with existing tower middleware: https://docs.rs/jsonrpsee-server/0.24.3/jsonrpsee_server/struct.ServerBuilder.html#method.set_rpc_middleware

So it seems like our alternatives are:

Use a rate-limit in the stream, for example tokio’s throttle
Implement our own backpressure

Since archiving is reasonably infrequent, it might be simpler to rate-limit the number of batches returned per second. Even quite a slow limit like 1 per second should be able to keep up with the archiving rate, as long as the batches are big enough. It’s not ideal for slow clients or network connections, but as long as the batch size and delay are configurable it should work.

A custom backpressure implementation is likely to impact performance, and could be tricky to test. I’ve tried doing it before, and it was extremely challenging to get right. (We needed a backpressure on a priority queue by block height, rather than the standard tower backpressure which doesn’t distinguish between requests.)

For second some kind of subscription filtering is needed, like an app sending hashes of objects it cares about and subscription only returning mappings for those hashes.

That’s a good idea, I’ll add it to the list in Notion.

From purely implementation point of view instead of storing everything in a large BTreeSet and adding custom wrapper types it would be better to have a small hashmap and flat vectors for each piece index with object mappings corresponding to it, which will use a lot less memory, less allocations and will be way more efficient to process.

Is there any need for a cache if we’re only using subscriptions?

But here’s some background that might be helpful if we do end up using a cache:

I agree that the number allocations and lookup efficiency would be better (by roughly the number of mappings per piece / 6, and roughly O(log(n))).

But as of 2022, the memory usage of HashMaps is usually greater than BTreeMaps, by up to 1.5x. See this graph in https://ntietz.com/blog/rust-hashmap-overhead/ .

If the offset and object hash take 36 bytes, then 1.5x that memory usage is 18 bytes per item. This is greater than the 8 bytes of PieceIndex saved by using a HashMap<PieceIndex, Vec<(u32, Blake3Hash)>>. Using a HashMap also makes it trickier to count the number of items in the cache, and remove the earliest item from the cache, unless we use an ordered map. (Which uses more memory.)

But if we’re not caching, none of that really matters.

teor2345 commented 1 week ago

We decided to go another way with this, so I don’t think there’s any point trying to revise or build on this PR. I’ll submit a separate one with a new design.

nazar-pc commented 1 week ago

This is great feedback, next time what’s a good way to get it before I write the code? I posted a Slack message with this design, I’m not sure if you saw it last week?

Just tag me explicitly if you want to get my feedback. I may have read it, but didn't pay too much attention, sorry about that.

Since archiving is reasonably infrequent, it might be simpler to rate-limit the number of batches returned per second. Even quite a slow limit like 1 per second should be able to keep up with the archiving rate, as long as the batches are big enough. It’s not ideal for slow clients or network connections, but as long as the batch size and delay are configurable it should work.

A custom backpressure implementation is likely to impact performance, and could be tricky to test. I’ve tried doing it before, and it was extremely challenging to get right. (We needed a backpressure on a priority queue by block height, rather than the standard tower backpressure which doesn’t distinguish between requests.)

We have a backpressure of sorts in archiving to make sure farmer had a chance to pull the pieces from node, but I don't like it too much necessarily. It is a working solution though and we can make async acknowledgements so that we can confirm a range of messages infrequently.

This is something that is worth discussing upstream though (probably in polkadot-sdk). Every time there is a subscription or stream of some kind light bulb goes off for me wondering what ensures that things don't blow up.

Is there any need for a cache if we’re only using subscriptions?

I don't think it is needed. In fact those who build high-performance indexing services can use subspace-service as a library and have direct access to things that way.

But as of 2022, the memory usage of HashMaps is usually greater than BTreeMaps, by up to 1.5x. See this graph in https://ntietz.com/blog/rust-hashmap-overhead/ .

They just work differently, it depends on distribution of items for example. My point though was to not store everything in a flat set/map, but rather use vectors instead. In fact outer hashmap can be replaced with VecDeque as well, removing the need for maps completely (it is not hard to find things in a small VecDeque when you need to find a particular piece, then you get the most efficient storage and still have original ordering.

teor2345 commented 1 week ago

We have a backpressure of sorts in archiving to make sure farmer had a chance to pull the pieces from node, but I don't like it too much necessarily. It is a working solution though and we can make async acknowledgements so that we can confirm a range of messages infrequently.

Just double-checking that you're talking about the acknowledge_archived_segment_header method and its associated infrastructure?

Happy to modify that code to work with mapping piece indexes instead.

nazar-pc commented 1 week ago

Just double-checking that you're talking about the acknowledge_archived_segment_header method and its associated infrastructure?

Yes, but I think there is still a benefit of having a generic solution for this in Substrate/jsonrpsee, so upstream discussion is the best first step (unless such discussion already exists).

autonomys / subspace

Split DSN mapping RPC into a piece index subscription and full mapping method #3002

Code contributor checklist: