Smartly share the data being used for encoding and MT generation

schomatis commented 5 years ago

In the default case of transform_and_replicate_layers where we generate MTs in parallel with layer encoding we currently duplicate the entire sector to avoid collisions between the MT that is reading the data to generate the tree and the VDE that is modifying that same data to generate the next layer.

To avoid that duplication and reduce memory consumption we provide an option to pipe those two operations and avoid data races, but that comes at the extreme cost of the MT being "another" VDE with comparable delay times which has a significant speed degradation.

Ideally those two process should be able to operate together on the same sector provided they do so in a more granular fashion, e.g., if we subdivide the sector in blocks of say, 64KiB, the only restriction would be that the VDE should mutate a block with a lower index that the one the MT is reading (implied is that the other blocks have already been incorporated into the tree).

This would (I think) almost remove performance degradation at both, speed and memory. Memory because we can comfortably choose a block size orders of magnitude smaller than any production-ready sector size, and speed because the MT should be faster than VDE and not be gating its progress.

This proposal has some implementation considerations (like that VDE sometimes traverses the layer in a forward manner and sometimes in reverse, which the MT currently doesn't) which have a clear solution, my question would be if this violates some of the assumptions of the ZigZag construction though.

porcuquine commented 5 years ago

Just a note that in practice merkle-tree generation is only faster than encoding when sufficient parallelism. This doesn't contradict your analysis, but it adds a nuance to 'the MT should be faster than VDE '.

Bear in mind also that we are moving to a construction which will not generate one merkle tree per layer. This will lessen that consideration but also means we cannot generate trees as we go. Rather, we will need to accumulate leaf values into an aggregate hash then generate the whole tree at the end. I'm not trying to fully specify this here because design will be forthcoming. Just an early thought to keep assumptions aligned.

schomatis commented 5 years ago

Just a note that in practice merkle-tree generation is only faster than encoding when sufficient parallelism. This doesn't contradict your analysis, but it adds a nuance to 'the MT should be faster than VDE '.

Good point, I'm always assuming that scenario, in this case I think that if the user doesn't have enough parallelism then it has not much to win from parallel MT/VDE processing in the first place.

we will need to accumulate leaf values into an aggregate hash then generate the whole tree at the end.

I'll expand then on the second example I didn't initially write to avoid complexity but seems worth it given this note, ideally I wouldn't want to partition the data up front in blocks but I'd rather have a (you guessed it) streaming architecture where one process feeds the result to the next one and each one independently decide when they've formed an outputable (this word should exist) structure like a tree or a layer.

filecoin-project / rust-fil-proofs

Smartly share the data being used for encoding and MT generation #826