filecoin-project / rust-fil-proofs

Proofs for Filecoin in Rust
Other
489 stars 314 forks source link

Clarify minimum piece size ( raw and padded ) #1231

Closed ribasushi closed 3 years ago

ribasushi commented 4 years ago

Description

There are currently 3 conflicting limits across the rust-proofs/lotus landscape. It would be nice if we can normalize all of them: people do try to store really small files in filecoin.

Specifically:

runner0@Viator:/Users/devel/devel/lotus$ ./lotus client commP $(pwd)/commP.img t03360 ERROR: computing commP failed: Piece must be at least 127 bytes

runner0@Viator:/Users/devel/devel/lotus$ dd if=/dev/urandom of=commP.img bs=1 count=64 64 bytes transferred in 0.000313 secs (204444 bytes/sec)

runner0@Viator:/Users/devel/devel/lotus$ ./lotus client commP $(pwd)/commP.img t03360 CID: baga6ea4seaqjlriob52bjtsoqozkulmodgc7ngsmftfpurpl47zbul2fykjvahy Piece size: 127 B


* Default limit of **256 bytes** encoded in lotus miner defaults:

./lotus-miner storage-deals get-ask Price per GiB / Epoch Min. Piece Size (w/bit-padding) Max. Piece Size (w/bit-padding) Expiry (Epoch) Expiry (Appx. Rem. Time) Seq. No. 500000000 256 B 512 MiB 1001803 6915h20m0s 2



### Acceptance criteria

Come up with an unambiguous set of minimum numbers for unpadded / padded sizes, so that the lotus team can propagate through the error-checking and defaults.
ribasushi commented 4 years ago

/cc @jimpick

porcuquine commented 4 years ago

I don't think the question is exactly ambiguity (or perhaps you would define it this way), but that there are many different constraints for different reasons. Whether a given size is 'okay' for some purpose depends on where in the pipeline it sits. This also explains why users may be able to provoke different behaviors and error messages by probing the system in different ways.

I don't disagree that Lotus may want or need to either add more aggressive, earlier size checking, or else provide more specific error messages.

I think the description here is still accurate: https://github.com/filecoin-project/specs/blob/old-spec/client-data.md. This defines the minimum meaningful payload to be 127 bytes (before padding). Any payload smaller than that will need to be padded before CommP is generated. This is not a restriction imposed by CommP, though. It is just that if a node ignores this rule and tries to pack pieces smaller than this (say, 63 bytes) then the result will not be byte-aligned after bit padding. Non-byte-aligned pieces cannot be packed together in such a way that their CommPs can be calculated independently (because adjacent pieces might be packed into the same byte).

If nodes REALLY want to handle pieces smaller than 64 bytes, then I suppose they could define the packing rule to insert some 'extra bit padding' (a sixth kind not accounted for below) in the case of misaligned tiny pieces. The current rules were designed to minimize complexity (there's still a lot, though) without being too restrictive. This may explain why there exists an apparently 'soft limit'. I don't know what would happen if you succeeded in creating pieces below the prescribed limit then tried to retrieve them. It might: a) work accidentally; b) produce an error only when unpacking; c) silently return the wrong value. I do suspect that any of these cases could be transformed into d) work correctly by design — so if that is the preferred resolution, it might be worth spending the time to ensure d) is the case. Then I think it would be safe to remove the proscription on pieces smaller than 127 bytes (but ensuring sizes between 63 and 127 bytes were also tested, although there would be no space savings — just differently-named padding).

The important point is that these considerations all relate to the interface between the client and the miner, not with how either interacts with proofs. Proofs (I am fairly certain) just need pieces to be a power of 2 (> 32) bytes. It's not entirely clear to me that there is anything actionable for rust-fil-proofs to do. It's my belief that the proofs code is as permissive as possible and only cares about the requirements imposed by ability to create well-formed proofs. For example, when CommD/CommP is computed, the input needs to have already been padded such that it is a power of 2 bytes. This subsumes a number of other requirements which might have applied earlier.

It might be that this document provides some of the unambiguous specification you are looking for: Padding (5 kinds).

cc: @yiannisbot

If you come to believe the proofs code is emitting the wrong errors, or in the wrong situation, please follow up. Otherwise, I think whatever changes are required are better categorized as node (Lotus) UX or specification issues. In that case, I recommend creating parallel issues in the relevant repos, perhaps linking to this comment for context.

ribasushi commented 4 years ago

If nodes REALLY want to handle pieces smaller than 64 bytes,

The above is precisely what I want to avoid. The answer I seek is what is the minimum practical size within lotus-land, such that if I hand it to rust-fil-proofs, as-is, there will be no further burden on me to receive the very same bytes after a seal/unseal cycle.

To put the question differently: what is the minimum amount of bytes that will correctly roundtrip through the rust-fil-proofs machinery, as exposed by filecoin-ffi

porcuquine commented 4 years ago

To put the question differently: what is the minimum amount of bytes that will correctly roundtrip through the rust-fil-proofs machinery, as exposed by filecoin-ffi

To summarize and editorialize: I suggest that this value has already been defined as 127 bytes, that this is believed safe, and that discovery otherwise is a bug which should be reported. While it may be that — in practice — that value is or could be me made less, I see no compelling reason to investigate or potentially expand the border.

Therefore, the simplest resolution would probably be to add one more 'prescriptive limit' in Lotus (and/or other nodes). This could be accomplished by adding a new error check, or by enforcing padding to 127 bytes (before bit padding).

However, please note that my answer hides a subtlety because I don't know exactly how Lotus (and/or other nodes) generally handle a contract to 'correctly roundtrip' bytes. Do you accept that the size of your pieces may changes during the roundtrip, and if not, do you (the client) accept the burden of only trying to roundtrip pieces whose sizes are aligned? In other words, I don't know of a mechanism to preserve metadata about length. We once considered providing this at the sector level in order to facilitate precision on-chain verification of original data, but I think we decided that such framing would needed to be provided within the payload itself (for simplicity, as before).

The reason I call this a subtlety is that if Lotus were to define an 'effective minimum piece size' of 127 bytes (which I think is the right answer) — but to accomplish this by always first padding smaller pieces to 127 bytes. Then depending on whether you take responsibility for the length metadata, there might be no effective minimum. That is, in such an example you could put in a specific byte as input, and get that same byte back as output — along with 126 more (null) bytes you don't care about. I'll leave it to you to decide how to interpret this — but it does suggest a potential relationship between the chosen solution and the answer to your question.

ribasushi commented 4 years ago

I suggest that this value has already been defined as 127 bytes, that this is believed safe

Enough said! :)

Would you consider my 3rd highlight in the original bugreport to be an error ( that the minimum piece size accepted in deals is 256 bytes ) ?

porcuquine commented 4 years ago

From this perspective, it’s needlessly restrictive. There may be some other constraint from which it is derived and about which I don’t know, though.

dignifiedquire commented 3 years ago

I believe this was solved, if there are any actionable issues needed to happen on the proofs side, please open a new issue :)