Closed michaelsbradleyjr closed 2 years ago
Hello :wave:
I've been looking into this issue. The thing that made sense at a first glance would be to change the manifest add
signature to get not only the cid
but also the span
or size
of the cid. The manifest would have a span
field and we would be appending to its size every block that is added.
The only problem I found is that the manifest constructor takes a sequence of blocks and this sort of change would make a mess of the constructor (since this would have to turn into a tuple). Personally my recommendation would be to dump the cid sequence in the constructor in favor of a more explicit API. Is the constructor cid part strictly needed?
Hey @acud! Welcome!
So, there is already a blockSize
fields in the manifest and with blocks being fixed size we don't have to capture it per individual block. What we need is an additional field that captures the length of the original data, something like uploadSize
or the likes. There are however a few considerations to keep in mind.
First, if you look at the original manifest definition:
type
Manifest* = ref object of RootObj
rootHash*: ?Cid # root (tree) hash of the contained data set
blockSize*: int # size of each contained block (might not be needed if blocks are len-prefixed)
blocks*: seq[Cid] # block Cid
version*: CidVersion # Cid version
hcodec*: MultiCodec # Multihash codec
codec*: MultiCodec # Data set codec
case protected*: bool # Protected datasets have erasure coded info
of true:
K*: int # Number of blocks to encode
M*: int # Number of resulting parity blocks
originalCid*: Cid # The original Cid of the dataset being erasure coded
originalLen*: int # The length of the original manifest
else:
discard
You'll see that if the manifest is protected
, there already is an originalLen
field. This is the size of the non-erasure coded manifest, when erasure coding, we wrap the original non-erasure coded manifest and extend it into a "protected", or erasure coded manifest. Since the coding is systematic, all the original blocks are also part of the erasure coded manifest plus the parity blocks. I bring this up, because it might be confusing when not having the correct context. So the field needs to be placed outside of the case
declaration and should capture the size of the original data - originalSize/uploadSize
could be a good name.
Second, on upload the stream is fed to the chunker that splits the incoming stream into equal sized blocks. The chunker operates in two modes, with and without padding. The default is to pad, but since we want to count all the bytes as they come, we need to make sure that the last chunk/block is unpadded and pad it ourself. A better alternative would be to allow the chunker report the un-padded size it has read up until some point (or EOF) and set that on the manifest.
So to summarize:
@dryajov personally I would try to leave the chunker out of it since ideally you could capture the value directly from the data stream of the HTTP request. The chunk may generate more data due to erasure coding, but the underlying data-layer subtree node spans remains the same (read: leaking stuff from the chunker would actually not give you any added value). Also it would result in some ugly tuple return values from the chunker which is not so nice to interact with.
When looking at api.nim
it seems that you can have direct access to the actual underlying data size directly when pumping the data out of the stream:
try:
while not reader.atEof:
var
buff = newSeqUninitialized[byte](BlockSize)
len = await reader.readOnce(addr buff[0], buff.len)
buff.setLen(len)
if len <= 0:
break
trace "Got chunk from endpoint", len = buff.len
await stream.pushData(buff)
The problem is that it is handled too high up in the stack and the context does not have access to the actual manifest (which is created in store
). I would potentially just move this into the store
method in node.nim
and just do all the async-await magic there. It also would provide a nice clear-cut responsibility - you can just pass a stream directly into store
and it will handle it all for you and just give you back the CID.
personally I would try to leave the chunker out of it since ideally you could capture the value directly from the data stream of the HTTP request
The HTTP api/interface if just one of many ways of invoking the underlying api exposed by the node, it is a very thin layer that should be isolated from the remainder of the the stack, because it can be switched out for stdio
to use as a console executable, for example. So, the HTTP layer is really not the best place to do this.
The chunk may generate more data due to erasure coding, but the underlying data-layer subtree node spans remains the same (read: leaking stuff from the chunker would actually not give you any added value).
The chunker is the only thing that sits between the incoming stream and the padded blocks being produced by it. The only other way to do this, without making significant changes to the architecture, is to do padding manually and disable it on the chunker (it's just a flag). But there is really no reason to do that and architecturally there is 0 cost to adding an accessor to the chunker that reports on the amount of consumed bytes so far. This should be a simple method/func on the chunker called something like bytesRead
.
The problem is that it is handled too high up in the stack and the context does not have access to the actual manifest (which is created in store). I would potentially just move this into the store method in node.nim and just do all the async-await magic there.
There is a very good reason for not doing this, as I mentioned, this makes the entire architecture less modular and more leaky.
It also would provide a nice clear-cut responsibility - you can just pass a stream directly into store and it will handle it all for you and just give you back the CID.
Not sure I follow, from what you're describing, this is already more or less how it works. The REST layer hands a stream to the node's store
method, which hands it down to the chunker that produces the actual chunks/blocks which are stored on the manifest. All that is required is to count the incoming bytes and then pass them down to the manifest.
@acud I've gone ahead and added a PR #110 to alleviate some of the glue code required to pump data between the http layer and the node and specially to avoid an unnecessary copying (one less buffer allocation).
Also, as per our discussion, it might make sense to create a new type of stream that further augments the existing LPStream
with the read
and written
bytes, this new stream can then further wrap the http stream and augment it with the required info. As you noted, putting this methods into the base type might end up being awkward, which I agree with, hence yet another (thin) wrapper might be the best way further.
As another options, I still kinda like exposing this information to the chunker as well. Considering everything else, it is the least costly and invasive option and it doesn't feel out of place - after all, the chunker reads data and there is no reason not to make some stats around that available.
I looked into code. My proposal:
I think it's all changes we need. Note that BlockStore will still store last block padded to blockSize, and all operations on blocks will use those padded blocks (but they will be filled up with zeros rather than garbage!)
I think this introduces too many changes for something that can be accomplished with a very small change to the chunker itself (add a consumed
field that returns the amount of bytes read) and an additional field to the manifest (originalSize
).
For one, StoreStream should work on the manifest dataset, not the original file, otherwise we'd have to account for padding manually, which is going to be a pain.
The only place where the original file size matters is on download, if we store the original size in the manifest, it should be really easy to simply read up to the originalSize
during download.
Btw, we don't have to return the return LPStream(StoreStream.new(node.blockStore, manifest)).success
, we can return a BufferStream
and push data to it from node.retrieve
. Another approach is to have a flag to StoreStream
that would read in padded and unpadded mode.
I agree that we can remove rabin chunking (in fact, it's just a placeholder for now), but that doesn't merit getting rid of the chunker altogether, it's also useful in tests for example.
yeah, I didn't realized that Chunker is used in other tests, not just tested by itself. So let's keep it.
StoreStream is currently used for
node.retrieve
where it should unpaddedI vote for adding StoreStream.new parameter, bool unpadded = false
but I hope you have better name for it
we can return a BufferStream and push data to it from node.retrieve
yeah, existing fetchBlocksJob
scheme looks unreliable. While LPStream reads blocks sequentially, fetchBlocksJob throws batches. So, if all data in the first batch were received simultaneously, it will fullfill the first block read by LPStream, and then drop remaining blocks, so LPStream will contnue to read them one-by-one.
Funny that it may be the actual reason of slow downloading in your current test!
add a consumed field that returns the amount of bytes read) and an additional field to the manifest (originalSize
agree
yeah, existing fetchBlocksJob scheme looks unreliable. While LPStream reads blocks sequentially, fetchBlocksJob throws batches. So, if all data in the first batch were received simultaneously, it will fullfill the first block read by LPStream, and then drop remaining blocks, so LPStream will contnue to read them one-by-one.
Yeah, this is definitely a bandaid solution to prefetch some blocks, but I'm not sure if it actually makes any difference... Something to try out.
I vote for adding StoreStream.new parameter, bool unpadded = false but I hope you have better name for it
Yeah, I think this is probably the best approach right now, lets do this. Let's just have a default parameter called padded = true
and set it to false
if we want to read up to the original data size.
so, we have two independent tasks.
I think we agreed upon implementation of unpadding:
And for prefetching, we should modify fetchBlocksJob to push data into BufferStream with fixed-size queue.
Also, erasureJob
doesn't look reliable either - first we need enough place to store all decoded data, otherwise this approach will fail. Second - it will require a lot of time, so reading from LPStream may fail for timeout.
Time to make new issue.
I think we agreed upon implementation of unpadding:
Yep, sounds good and I like originalBytes
better.
And for prefetching, we should modify fetchBlocksJob to push data into BufferStream with fixed-size queue.
I don't think this is what we want at least not in this context, fetchBlocksJob
should really be called prefetchBlock
and it's a parallel task that attempts to download some blocks ahead of time and store them in the local store, so pushing on the BufferStream
doesn't make sense here.
Also, erasureJob doesn't look reliable either - first we need enough place to store all decoded data, otherwise this approach will fail. Second - it will require a lot of time, so reading from LPStream may fail for timeout.
For erasure decoding, we probably want a more intelligent downloader that fetches in recovery groups and kicks in recovery as soon as it got any K
pieces. If those are the original (systematic) pieces then we're don't do anything, if they aren't the original we kick in recovery and continue downloading in parallel, then we either stop the recovery or the download, whichever succeeds first.
I should add that I'm not entirely sure if the prefetch is required, but it seemed like a good thing to try out originally, we should definitely experiment with this.
and store them in the local store
but current NetworkStore.getBlock has only this code:
let blk = await self.engine.requestBlock(cid)
# TODO: add block to the local store
return blk.some.success
so implementing TODO may be the shortest path toward real prefetching.
For erasure decoding, we probably want a more intelligent downloader that fetches in recovery groups
there is small detail however. Current layout of recovery groups is that first group covers blocks 0, X, X*2..., second group covers blocks 1, X+1, X*2+1... Your idea assumes the "layout B" that we discussed in https://github.com/status-im/codex-research/pull/85#issuecomment-1150623235
but current NetworkStore.getBlock has only this code:
Yes, but the engine has a reference to the blockstore and when the block eventually arrives, it will be stored here - https://github.com/status-im/nim-codex/blob/main/codex/blockexchange/engine/engine.nim#L256
Your idea assumes the "layout B" that we discussed in https://github.com/status-im/codex-research/pull/85#issuecomment-1150623235
Maybe, I haven't thought about this in some time, but it is definitely another layer that we might want to consider when making the choice for the layout.
Yes, but
you won :) I made new issue for discussion of prefetching
As for this issue, we agreed on implementation, but I will wait till you and Michael will push your PRs.
The chunker pads chunks that are smaller than
BlockSize
.Currently, when a CID is retrieved via the REST API any padding is included. This can be seen when e.g. uploading a small text file via a node's REST API and then retrieving that file.
The original size of the upload should be tracked as metadata (in the manifest?) so that any padding can be removed during retrieval.