Better pipelining? - Githubissues

ekmett commented 4 years ago

Currently, Encoder has to create ALL the SourceBlockEncoders, and finish converting all the data in all the blocks into symbols before you can send out even the first packet. As the data being transmitted becomes larger and larger this becomes a bigger and bigger stall during which no bandwidth can be used.

On the other hand, all that is really needed is that the first block be fully transformed into symbols to start sending the first block of systematic data. Heck, if you really wanted to pipeline things, then the systematic data for a block without subblocks can probably be constructed while the requested repair packets are being computed in the background.

Some form of lazy initialization would be useful here.

This is actually a place where the current API is limiting in that to do it myself using the existing SourceBlockEncoders, not quite enough is exposed to construct the source blocks, as the partition function and logic used to figure out block sizes and what not are unexported.

One option would be to factor apart Encoder into two parts: one that figures out the plan of which blocks with what ranges of the source data are needed, which can be done with just a size, and no data required, and one that instantiates that plan either eagerly like now, or lazily as you first touch each block.

cberner commented 4 years ago

Factoring the logic out of Encoder::new() that splits the data into blocks and then making that a public function sounds good to me. Alternately, if you want to make Encoder lazy I'm open to that as long as the implementation isn't too complex. I'd like to keep raptorq relatively simple, so prefer exposing more functionality over optimizing for specific use cases.

ekmett commented 4 years ago

In the fork on my account I have a work in progress version of an EncodingPlan that lets you know how many blocks there are and where they start/stop. Right now it just acts as a cache for the constants involved to avoid recomputing the SourceBlockEncodingPlans and gives a factory-like method for producing source blocks on demand using possibly offline data sources.

Might make sense to relabel it and use something like it inside both encoding/decoding setup to avoid duplicating math in the encoder/decoder, except for the fact that the decoder doesn't even need the plan data.

cberner commented 4 years ago

@ekmett can you try out https://github.com/cberner/raptorq/pull/56 ? I believe that provides the functionality you need

cberner commented 4 years ago

Closing for now, since I think that new function will allow you to implement what you need, but please re-open if not

cberner / raptorq

Better pipelining? #50